Dataspaces
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

A Dataspace is Sync's primary data organization unit—think of it as a structured repository that combines unstructured content (documents, images, CAD files, videos) with their AI-generated structured derivatives (embeddings, extracted metadata, text, summaries).

Dataspaces provide complete isolation between different business use cases, similar to how separate databases keep different application data segregated. Each dataspace has its own storage, metadata schema, and access controls.

What is a Dataspace?

At a high level, a dataspace is analogous to a table in a data lakehouse, but specifically designed for unstructured content. Each row represents a piece of content (a file), and the columns represent:

Original file (stored as a blob)
Extracted text (from PDFs, images via OCR, audio transcripts, etc.)
Vector embeddings (for semantic search)
Structured metadata (both user-provided and AI-extracted)
Derivatives (thumbnails, previews, page-level extractions)

This design makes it possible to store, search, and analyze unstructured content as if it were structured data—enabling SQL queries, BI tools, and traditional analytics workflows alongside AI-powered semantic search.

Key Characteristics

Isolated Storage: Each dataspace maintains complete separation from other dataspaces. Content, metadata, and embeddings in one dataspace are completely isolated from another, allowing you to:

Segregate data by department, project, or customer
Apply different security policies to different dataspaces
Scale storage independently for different use cases

Schema Flexibility: Dataspaces support arbitrary metadata—you can attach any JSON structure to your content without defining a schema upfront. Optionally, you can define an ontology to:

Enforce validation rules on uploaded metadata
Automatically extract structured data from documents during ingestion
Define categories and taxonomies for your content

Multi-Format Support: Dataspaces handle over 40 different file types, including:

Documents (PDF, Word, Excel, PowerPoint)
Images (JPEG, PNG, TIFF with OCR)
CAD files (AutoCAD, SolidWorks)
Media (videos with transcripts, audio with speech-to-text)
Archives (ZIP, TAR with automatic extraction)

Direct Access: Data in a dataspace can be accessed through:

REST API: Full-featured programmatic access
SQL: Direct database connections for BI tools and analytics
Cloud Provider APIs: Native blob storage access (e.g., S3 API)

Creating a Dataspace

Dataspaces are created via the Admin API and associated with an account and an ontology.

Example: Create a Dataspace

POST https://cloud.syncdocs.ai/api/accounts/{accountId}/dataspaces
Authorization: Bearer <token>
Content-Type: application/json

{
  "name": "Legal Contracts",
  "description": "Repository for all legal contracts and agreements",
  "ontologyId": "ont-3fa85f64-5717-4562-b3fc-2c963f66afa6"
}

Response:

{
  "id": "sds-12345678-abcd-1234-efgh-123456789012",
  "accountId": "acc-98765432-dcba-4321-hgfe-987654321098",
  "name": "Legal Contracts",
  "description": "Repository for all legal contracts and agreements",
  "ontologyId": "ont-3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "createdAt": "2024-10-28T12:00:00Z",
  "updatedAt": "2024-10-28T12:00:00Z"
}

Parameters:

name (required): Human-readable name for the dataspace
description (required): Description of the dataspace's purpose
ontologyId (required): UUID of the ontology that defines how content is organized and validated in this dataspace

Example: List All Dataspaces

GET https://cloud.syncdocs.ai/api/accounts/{accountId}/dataspaces
Authorization: Bearer <token>

Response:

[
  {
    "id": "sds-12345678...",
    "name": "Legal Contracts",
    "description": "Repository for all legal contracts and agreements",
    "ontologyId": "ont-3fa85f64...",
    "createdAt": "2024-10-28T12:00:00Z"
  },
  {
    "id": "sds-87654321...",
    "name": "Marketing Assets",
    "description": "All marketing collateral and brand assets",
    "ontologyId": "ont-abc12345...",
    "createdAt": "2024-10-25T09:15:00Z"
  }
]

Using a Dataspace

Once created, you interact with a dataspace primarily through a workspace. Workspaces provide the compute layer that enables you to:

Upload content to the dataspace
Trigger AI-powered ingestion and extraction
Query content with natural language
Execute workflows and batch operations
Access structured metadata via API or SQL

See the Workspaces documentation for details on how workspaces access dataspace content.

Dataspace Architecture

Under the hood, each dataspace consists of:

Blob Storage:

Raw uploaded files
Extracted text and derivatives
Generated thumbnails and previews
Accessible via cloud provider's native APIs

Relational Tables:

Content metadata (file names, types, upload dates)
User-provided metadata (arbitrary JSON)
AI-extracted metadata (from ontology-driven queries)
Supports direct SQL queries for analytics

Vector Store:

Embeddings for semantic search
Multiple chunk sizes for different query types
High-performance similarity search indexes

All three storage layers are transparently managed by Sync and accessed through a unified API.

Use Cases

Department Isolation: Create separate dataspaces for HR, Legal, Finance, and Engineering—each with its own ontology, access controls, and storage.

Customer Segmentation: In multi-tenant applications, create one dataspace per customer to ensure complete data isolation and independent scaling.

Project-Specific Repositories: Create dataspaces for specific initiatives (e.g., "Q4 Product Launch", "Audit 2024") with tailored ontologies and temporary lifecycles.

Compliance & Data Residency: Use separate dataspaces in different regions to comply with data sovereignty requirements (GDPR, CCPA, etc.).

Coming Soon: Remote Dataspaces

Remote dataspaces will enable you to point Sync at existing repositories without migrating data. Instead of uploading files to Sync's storage, you'll configure a remote dataspace to reference external systems like:

SharePoint: Sync content directly from SharePoint sites
Cloud Storage: Point to existing S3 buckets, Azure Blob Storage, or Google Cloud Storage
Network Drives: Access on-premises file shares
Document Management Systems: Integrate with existing ECM platforms

Remote dataspaces will enable Sync's AI capabilities (semantic search, metadata extraction, query agents) while keeping data in its original location. This is ideal for:

Organizations with existing large document repositories
Compliance scenarios requiring data to remain in specific systems
Hybrid deployments with on-premises and cloud storage

Status: Remote dataspaces are currently in development. Contact us to join the early access program.

Next Steps

Understand Workspaces - Learn how to access and process dataspace content
Explore Content - Understand the content model within dataspaces
Learn about Ontologies - Define schemas and validation rules for your dataspaces
API Reference - Complete API documentation for dataspaces

DataspacesCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code