Skip to content
Last updated

Content

Content is the fundamental unit of data in Sync—a file paired with its AI-generated derivatives. Each piece of content represents a document, image, video, CAD file, or any other file type, along with all the structured data extracted from it.

What is Content?

Content in Sync is more than just a file. It's a comprehensive data object that includes:

  • Original file: The raw uploaded file (PDF, image, spreadsheet, etc.)
  • Extracted text: Full text extracted using specialized parsers
  • Vector embeddings: Semantic representations for AI-powered search
  • Structured metadata: Both user-provided and AI-extracted fields
  • Derivatives: Thumbnails, previews, and processed outputs

Unlike traditional file storage systems that only track files and basic metadata, Sync treats every piece of content as a rich, queryable data object that can be searched semantically, filtered by structured fields, and analyzed using SQL.

Content Lifecycle

Content moves through a two-phase lifecycle:

Phase 1: Upload

POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}
  • File is stored in blob storage
  • User-provided metadata is stored
  • Content record is created
  • Not yet searchable or AI-ready

Phase 2: Ingestion

POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/{contentId}/ingest
  • Text extraction and parsing
  • Chunking and embedding generation
  • Ontology-driven queries execute (if defined)
  • Content becomes fully searchable and queryable

This separation allows you to batch uploads and then trigger ingestion when ready, providing control over when compute-intensive AI processing occurs.

Content Structure

Each content object has the following fields:

Core Fields:

  • contentId (UUID): Unique identifier
  • dataspaceId (string): The dataspace containing this content
  • fileName (string): Original file name
  • fileFormat (string): MIME type (e.g., application/pdf)
  • createdAt (timestamp): Upload time
  • updatedAt (timestamp): Last modification time

Enrichment Fields:

  • categoryId (UUID, nullable): Category from ontology taxonomy
  • metadata (JSON object): Arbitrary user-provided and AI-extracted data
  • inferenceTaskExecutions (object): Record of AI operations performed
  • fileUrl (string): URL to download the original file
  • fileSize (number): File size in bytes

Example Content Object:

{
  "contentId": "550e8400-e29b-41d4-a716-446655440000",
  "dataspaceId": "sds-abc12345",
  "fileName": "contract-acme-2024.pdf",
  "fileFormat": "application/pdf",
  "categoryId": "cat-contract-uuid",
  "metadata": {
    "customerName": "Acme Corp",
    "contractType": "Master Service Agreement",
    "effectiveDate": "2024-01-15",
    "annualValue": 250000,
    "extractedParties": ["Acme Corp", "Tech Innovations LLC"]
  },
  "inferenceTaskExecutions": {
    "effectiveDate": "task-execution-uuid-1",
    "extractedParties": "task-execution-uuid-2"
  },
  "fileUrl": "https://sws-12345678.cloud.syncdocs.ai/api/content/sds-abc12345/550e8400.../download",
  "fileSize": 2048576,
  "createdAt": "2024-10-28T10:30:00Z",
  "updatedAt": "2024-10-28T10:35:00Z"
}

Creating Content

Content is created via multipart form upload to the Workspace API.

Example: Upload with Metadata

POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}
Content-Type: multipart/form-data
Authorization: Bearer <token>

{
  "file": <binary data>,
  "categoryId": "cat-contract-uuid",
  "fileFormat": "application/pdf",
  "fileName": "contract-acme-2024.pdf",
  "metadata": {
    "customerName": "Acme Corp",
    "contractType": "Master Service Agreement",
    "signedDate": "2024-01-10",
    "internalId": "CONT-2024-0042"
  }
}

Response:

{
  "contentId": "550e8400-e29b-41d4-a716-446655440000",
  "dataspaceId": "sds-abc12345",
  "fileName": "contract-acme-2024.pdf",
  "fileFormat": "application/pdf",
  "categoryId": "cat-contract-uuid",
  "metadata": {
    "customerName": "Acme Corp",
    "contractType": "Master Service Agreement",
    "signedDate": "2024-01-10",
    "internalId": "CONT-2024-0042"
  },
  "createdAt": "2024-10-28T10:30:00Z"
}

Validation Options

By default, Sync validates that:

  • The specified categoryId exists in the dataspace's ontology
  • All required metadata fields (defined in the ontology) are provided
  • Metadata field types match ontology definitions

To skip validation (useful for bulk imports or partial data):

POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}?validateRequiredMetadata=false

Metadata

Content metadata is a flexible JSON object that can contain any fields you need.

User-Provided Metadata: Added during upload—completely arbitrary structure. No schema required unless you define an ontology.

{
  "customerName": "Acme Corp",
  "priority": "high",
  "tags": ["urgent", "contract"],
  "customFields": { "any": "structure" }
}

AI-Extracted Metadata: Added during ingestion if an ontology with metadata queries is defined. These become queryable fields alongside user-provided data.

{
  "customerName": "Acme Corp",           // User-provided
  "effectiveDate": "2024-01-15",        // AI-extracted
  "parties": ["Acme Corp", "Tech LLC"],  // AI-extracted
  "termLength": 36                       // AI-extracted
}

Both types of metadata are stored in the same metadata field and are equally queryable via API or SQL.

Accessing Content

List Content with Filtering

GET https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}?
  category=cat-contract-uuid&
  pageSize=50&
  cursor={nextPageCursor}

Get Specific Content

GET https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/{contentId}

Download Original File

GET https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/{contentId}/download

Query by Metadata (SQL)

-- Find all contracts over $200k
SELECT 
  id as content_id,
  file_name,
  metadata->>'customerName' as customer,
  (metadata->>'annualValue')::numeric as value
FROM content
WHERE category_id = 'cat-contract-uuid'
  AND (metadata->>'annualValue')::numeric > 200000
ORDER BY (metadata->>'annualValue')::numeric DESC;
POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/query
{
  "query": "Find all contracts with auto-renewal clauses",
  "filters": {
    "categoryId": "cat-contract-uuid"
  }
}

Next Steps