Content is the fundamental unit of data in Sync—a file paired with its AI-generated derivatives. Each piece of content represents a document, image, video, CAD file, or any other file type, along with all the structured data extracted from it.
Content in Sync is more than just a file. It's a comprehensive data object that includes:
- Original file: The raw uploaded file (PDF, image, spreadsheet, etc.)
- Extracted text: Full text extracted using specialized parsers
- Vector embeddings: Semantic representations for AI-powered search
- Structured metadata: Both user-provided and AI-extracted fields
- Derivatives: Thumbnails, previews, and processed outputs
Unlike traditional file storage systems that only track files and basic metadata, Sync treats every piece of content as a rich, queryable data object that can be searched semantically, filtered by structured fields, and analyzed using SQL.
Content moves through a two-phase lifecycle:
Phase 1: Upload
POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}- File is stored in blob storage
- User-provided metadata is stored
- Content record is created
- Not yet searchable or AI-ready
Phase 2: Ingestion
POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/{contentId}/ingest- Text extraction and parsing
- Chunking and embedding generation
- Ontology-driven queries execute (if defined)
- Content becomes fully searchable and queryable
This separation allows you to batch uploads and then trigger ingestion when ready, providing control over when compute-intensive AI processing occurs.
Each content object has the following fields:
Core Fields:
contentId(UUID): Unique identifierdataspaceId(string): The dataspace containing this contentfileName(string): Original file namefileFormat(string): MIME type (e.g.,application/pdf)createdAt(timestamp): Upload timeupdatedAt(timestamp): Last modification time
Enrichment Fields:
categoryId(UUID, nullable): Category from ontology taxonomymetadata(JSON object): Arbitrary user-provided and AI-extracted datainferenceTaskExecutions(object): Record of AI operations performedfileUrl(string): URL to download the original filefileSize(number): File size in bytes
Example Content Object:
{
"contentId": "550e8400-e29b-41d4-a716-446655440000",
"dataspaceId": "sds-abc12345",
"fileName": "contract-acme-2024.pdf",
"fileFormat": "application/pdf",
"categoryId": "cat-contract-uuid",
"metadata": {
"customerName": "Acme Corp",
"contractType": "Master Service Agreement",
"effectiveDate": "2024-01-15",
"annualValue": 250000,
"extractedParties": ["Acme Corp", "Tech Innovations LLC"]
},
"inferenceTaskExecutions": {
"effectiveDate": "task-execution-uuid-1",
"extractedParties": "task-execution-uuid-2"
},
"fileUrl": "https://sws-12345678.cloud.syncdocs.ai/api/content/sds-abc12345/550e8400.../download",
"fileSize": 2048576,
"createdAt": "2024-10-28T10:30:00Z",
"updatedAt": "2024-10-28T10:35:00Z"
}Content is created via multipart form upload to the Workspace API.
POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}
Content-Type: multipart/form-data
Authorization: Bearer <token>
{
"file": <binary data>,
"categoryId": "cat-contract-uuid",
"fileFormat": "application/pdf",
"fileName": "contract-acme-2024.pdf",
"metadata": {
"customerName": "Acme Corp",
"contractType": "Master Service Agreement",
"signedDate": "2024-01-10",
"internalId": "CONT-2024-0042"
}
}Response:
{
"contentId": "550e8400-e29b-41d4-a716-446655440000",
"dataspaceId": "sds-abc12345",
"fileName": "contract-acme-2024.pdf",
"fileFormat": "application/pdf",
"categoryId": "cat-contract-uuid",
"metadata": {
"customerName": "Acme Corp",
"contractType": "Master Service Agreement",
"signedDate": "2024-01-10",
"internalId": "CONT-2024-0042"
},
"createdAt": "2024-10-28T10:30:00Z"
}By default, Sync validates that:
- The specified
categoryIdexists in the dataspace's ontology - All required metadata fields (defined in the ontology) are provided
- Metadata field types match ontology definitions
To skip validation (useful for bulk imports or partial data):
POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}?validateRequiredMetadata=falseContent metadata is a flexible JSON object that can contain any fields you need.
User-Provided Metadata: Added during upload—completely arbitrary structure. No schema required unless you define an ontology.
{
"customerName": "Acme Corp",
"priority": "high",
"tags": ["urgent", "contract"],
"customFields": { "any": "structure" }
}AI-Extracted Metadata: Added during ingestion if an ontology with metadata queries is defined. These become queryable fields alongside user-provided data.
{
"customerName": "Acme Corp", // User-provided
"effectiveDate": "2024-01-15", // AI-extracted
"parties": ["Acme Corp", "Tech LLC"], // AI-extracted
"termLength": 36 // AI-extracted
}Both types of metadata are stored in the same metadata field and are equally queryable via API or SQL.
GET https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}?
category=cat-contract-uuid&
pageSize=50&
cursor={nextPageCursor}GET https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/{contentId}GET https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/{contentId}/download-- Find all contracts over $200k
SELECT
id as content_id,
file_name,
metadata->>'customerName' as customer,
(metadata->>'annualValue')::numeric as value
FROM content
WHERE category_id = 'cat-contract-uuid'
AND (metadata->>'annualValue')::numeric > 200000
ORDER BY (metadata->>'annualValue')::numeric DESC;POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/query
{
"query": "Find all contracts with auto-renewal clauses",
"filters": {
"categoryId": "cat-contract-uuid"
}
}- Understand Queries - Learn how to search and analyze content
- Explore Ontologies - Define schemas and extraction rules
- Learn about Projects - Organize content into business workflows
- API Reference - Complete API documentation