Content
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

Content is the fundamental unit of data in Sync—a file paired with its AI-generated derivatives. Each piece of content represents a document, image, video, CAD file, or any other file type, along with all the structured data extracted from it.

What is Content?

Content in Sync is more than just a file. It's a comprehensive data object that includes:

Original file: The raw uploaded file (PDF, image, spreadsheet, etc.)
Extracted text: Full text extracted using specialized parsers
Vector embeddings: Semantic representations for AI-powered search
Structured metadata: Both user-provided and AI-extracted fields
Derivatives: Thumbnails, previews, and processed outputs

Unlike traditional file storage systems that only track files and basic metadata, Sync treats every piece of content as a rich, queryable data object that can be searched semantically, filtered by structured fields, and analyzed using SQL.

Content Lifecycle

Content moves through a two-phase lifecycle:

Phase 1: Upload

POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}

File is stored in blob storage
User-provided metadata is stored
Content record is created
Not yet searchable or AI-ready

Phase 2: Ingestion

POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/{contentId}/ingest

Text extraction and parsing
Chunking and embedding generation
Ontology-driven queries execute (if defined)
Content becomes fully searchable and queryable

This separation allows you to batch uploads and then trigger ingestion when ready, providing control over when compute-intensive AI processing occurs.

Content Structure

Each content object has the following fields:

Core Fields:

contentId (UUID): Unique identifier
dataspaceId (string): The dataspace containing this content
fileName (string): Original file name
fileFormat (string): MIME type (e.g., application/pdf)
createdAt (timestamp): Upload time
updatedAt (timestamp): Last modification time

Enrichment Fields:

categoryId (UUID, nullable): Category from ontology taxonomy
metadata (JSON object): Arbitrary user-provided and AI-extracted data
inferenceTaskExecutions (object): Record of AI operations performed
fileUrl (string): URL to download the original file
fileSize (number): File size in bytes

Example Content Object:

{
  "contentId": "550e8400-e29b-41d4-a716-446655440000",
  "dataspaceId": "sds-abc12345",
  "fileName": "contract-acme-2024.pdf",
  "fileFormat": "application/pdf",
  "categoryId": "cat-contract-uuid",
  "metadata": {
    "customerName": "Acme Corp",
    "contractType": "Master Service Agreement",
    "effectiveDate": "2024-01-15",
    "annualValue": 250000,
    "extractedParties": ["Acme Corp", "Tech Innovations LLC"]
  },
  "inferenceTaskExecutions": {
    "effectiveDate": "task-execution-uuid-1",
    "extractedParties": "task-execution-uuid-2"
  },
  "fileUrl": "https://sws-12345678.cloud.syncdocs.ai/api/content/sds-abc12345/550e8400.../download",
  "fileSize": 2048576,
  "createdAt": "2024-10-28T10:30:00Z",
  "updatedAt": "2024-10-28T10:35:00Z"
}

Creating Content

Content is created via multipart form upload to the Workspace API.

Example: Upload with Metadata

POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}
Content-Type: multipart/form-data
Authorization: Bearer <token>

{
  "file": <binary data>,
  "categoryId": "cat-contract-uuid",
  "fileFormat": "application/pdf",
  "fileName": "contract-acme-2024.pdf",
  "metadata": {
    "customerName": "Acme Corp",
    "contractType": "Master Service Agreement",
    "signedDate": "2024-01-10",
    "internalId": "CONT-2024-0042"
  }
}

Response:

{
  "contentId": "550e8400-e29b-41d4-a716-446655440000",
  "dataspaceId": "sds-abc12345",
  "fileName": "contract-acme-2024.pdf",
  "fileFormat": "application/pdf",
  "categoryId": "cat-contract-uuid",
  "metadata": {
    "customerName": "Acme Corp",
    "contractType": "Master Service Agreement",
    "signedDate": "2024-01-10",
    "internalId": "CONT-2024-0042"
  },
  "createdAt": "2024-10-28T10:30:00Z"
}

Validation Options

By default, Sync validates that:

The specified categoryId exists in the dataspace's ontology
All required metadata fields (defined in the ontology) are provided
Metadata field types match ontology definitions

To skip validation (useful for bulk imports or partial data):

POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}?validateRequiredMetadata=false

Metadata

Content metadata is a flexible JSON object that can contain any fields you need.

User-Provided Metadata: Added during upload—completely arbitrary structure. No schema required unless you define an ontology.

{
  "customerName": "Acme Corp",
  "priority": "high",
  "tags": ["urgent", "contract"],
  "customFields": { "any": "structure" }
}

AI-Extracted Metadata: Added during ingestion if an ontology with metadata queries is defined. These become queryable fields alongside user-provided data.

{
  "customerName": "Acme Corp",           // User-provided
  "effectiveDate": "2024-01-15",        // AI-extracted
  "parties": ["Acme Corp", "Tech LLC"],  // AI-extracted
  "termLength": 36                       // AI-extracted
}

Both types of metadata are stored in the same metadata field and are equally queryable via API or SQL.

Accessing Content

List Content with Filtering

GET https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}?
  category=cat-contract-uuid&
  pageSize=50&
  cursor={nextPageCursor}

Get Specific Content

GET https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/{contentId}

Download Original File

GET https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/{contentId}/download

Query by Metadata (SQL)

-- Find all contracts over $200k
SELECT 
  id as content_id,
  file_name,
  metadata->>'customerName' as customer,
  (metadata->>'annualValue')::numeric as value
FROM content
WHERE category_id = 'cat-contract-uuid'
  AND (metadata->>'annualValue')::numeric > 200000
ORDER BY (metadata->>'annualValue')::numeric DESC;

Semantic Search

POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/query
{
  "query": "Find all contracts with auto-renewal clauses",
  "filters": {
    "categoryId": "cat-contract-uuid"
  }
}

Next Steps

Understand Queries - Learn how to search and analyze content
Explore Ontologies - Define schemas and extraction rules
Learn about Projects - Organize content into business workflows
API Reference - Complete API documentation

ContentCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code