# Content **Content** is the fundamental unit of data in Sync—a file paired with its AI-generated derivatives. Each piece of content represents a document, image, video, CAD file, or any other file type, along with all the structured data extracted from it. ## What is Content? Content in Sync is more than just a file. It's a comprehensive data object that includes: - **Original file**: The raw uploaded file (PDF, image, spreadsheet, etc.) - **Extracted text**: Full text extracted using specialized parsers - **Vector embeddings**: Semantic representations for AI-powered search - **Structured metadata**: Both user-provided and AI-extracted fields - **Derivatives**: Thumbnails, previews, and processed outputs Unlike traditional file storage systems that only track files and basic metadata, Sync treats every piece of content as a rich, queryable data object that can be searched semantically, filtered by structured fields, and analyzed using SQL. ## Content Lifecycle Content moves through a two-phase lifecycle: **Phase 1: Upload** ```bash POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId} ``` - File is stored in blob storage - User-provided metadata is stored - Content record is created - **Not yet searchable or AI-ready** **Phase 2: Ingestion** ```bash POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/{contentId}/ingest ``` - Text extraction and parsing - Chunking and embedding generation - Ontology-driven queries execute (if defined) - **Content becomes fully searchable and queryable** This separation allows you to batch uploads and then trigger ingestion when ready, providing control over when compute-intensive AI processing occurs. ## Content Structure Each content object has the following fields: **Core Fields**: - `contentId` (UUID): Unique identifier - `dataspaceId` (string): The dataspace containing this content - `fileName` (string): Original file name - `fileFormat` (string): MIME type (e.g., `application/pdf`) - `createdAt` (timestamp): Upload time - `updatedAt` (timestamp): Last modification time **Enrichment Fields**: - `categoryId` (UUID, nullable): Category from ontology taxonomy - `metadata` (JSON object): Arbitrary user-provided and AI-extracted data - `inferenceTaskExecutions` (object): Record of AI operations performed - `fileUrl` (string): URL to download the original file - `fileSize` (number): File size in bytes **Example Content Object**: ```json { "contentId": "550e8400-e29b-41d4-a716-446655440000", "dataspaceId": "sds-abc12345", "fileName": "contract-acme-2024.pdf", "fileFormat": "application/pdf", "categoryId": "cat-contract-uuid", "metadata": { "customerName": "Acme Corp", "contractType": "Master Service Agreement", "effectiveDate": "2024-01-15", "annualValue": 250000, "extractedParties": ["Acme Corp", "Tech Innovations LLC"] }, "inferenceTaskExecutions": { "effectiveDate": "task-execution-uuid-1", "extractedParties": "task-execution-uuid-2" }, "fileUrl": "https://sws-12345678.cloud.syncdocs.ai/api/content/sds-abc12345/550e8400.../download", "fileSize": 2048576, "createdAt": "2024-10-28T10:30:00Z", "updatedAt": "2024-10-28T10:35:00Z" } ``` ## Creating Content Content is created via multipart form upload to the Workspace API. ### Example: Upload with Metadata ```bash POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId} Content-Type: multipart/form-data Authorization: Bearer { "file": , "categoryId": "cat-contract-uuid", "fileFormat": "application/pdf", "fileName": "contract-acme-2024.pdf", "metadata": { "customerName": "Acme Corp", "contractType": "Master Service Agreement", "signedDate": "2024-01-10", "internalId": "CONT-2024-0042" } } ``` **Response**: ```json { "contentId": "550e8400-e29b-41d4-a716-446655440000", "dataspaceId": "sds-abc12345", "fileName": "contract-acme-2024.pdf", "fileFormat": "application/pdf", "categoryId": "cat-contract-uuid", "metadata": { "customerName": "Acme Corp", "contractType": "Master Service Agreement", "signedDate": "2024-01-10", "internalId": "CONT-2024-0042" }, "createdAt": "2024-10-28T10:30:00Z" } ``` ### Validation Options By default, Sync validates that: - The specified `categoryId` exists in the dataspace's ontology - All required metadata fields (defined in the ontology) are provided - Metadata field types match ontology definitions To skip validation (useful for bulk imports or partial data): ```bash POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}?validateRequiredMetadata=false ``` ## Metadata Content metadata is a flexible JSON object that can contain any fields you need. **User-Provided Metadata**: Added during upload—completely arbitrary structure. No schema required unless you define an ontology. ```json { "customerName": "Acme Corp", "priority": "high", "tags": ["urgent", "contract"], "customFields": { "any": "structure" } } ``` **AI-Extracted Metadata**: Added during ingestion if an ontology with metadata queries is defined. These become queryable fields alongside user-provided data. ```json { "customerName": "Acme Corp", // User-provided "effectiveDate": "2024-01-15", // AI-extracted "parties": ["Acme Corp", "Tech LLC"], // AI-extracted "termLength": 36 // AI-extracted } ``` Both types of metadata are stored in the same `metadata` field and are equally queryable via API or SQL. ## Accessing Content ### List Content with Filtering ```bash GET https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}? category=cat-contract-uuid& pageSize=50& cursor={nextPageCursor} ``` ### Get Specific Content ```bash GET https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/{contentId} ``` ### Download Original File ```bash GET https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/{contentId}/download ``` ### Query by Metadata (SQL) ```sql -- Find all contracts over $200k SELECT id as content_id, file_name, metadata->>'customerName' as customer, (metadata->>'annualValue')::numeric as value FROM content WHERE category_id = 'cat-contract-uuid' AND (metadata->>'annualValue')::numeric > 200000 ORDER BY (metadata->>'annualValue')::numeric DESC; ``` ### Semantic Search ```bash POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/query { "query": "Find all contracts with auto-renewal clauses", "filters": { "categoryId": "cat-contract-uuid" } } ``` ## Next Steps - [Understand Queries](/concepts/queries) - Learn how to search and analyze content - [Explore Ontologies](/concepts/ontologies) - Define schemas and extraction rules - [Learn about Projects](/concepts/projects) - Organize content into business workflows - [API Reference](/api) - Complete API documentation