Skip to content
Last updated

Workspaces

A Workspace is Sync's compute layer—a scalable cluster that handles all document processing, AI-powered queries, ingestion pipelines, and API access to your data. Workspaces are the engines that transform raw documents in dataspaces into AI-ready, searchable, queryable content.

What is a Workspace?

Think of a workspace as a dedicated compute cluster that sits between your application and your data. While dataspaces store your content and metadata, workspaces do all the computational heavy lifting:

  • API Hosting: Serve the REST APIs your applications use to interact with content
  • Document Processing: Extract text, generate embeddings, create thumbnails
  • AI Operations: Execute queries, run research agents, perform semantic search
  • Workflow Execution: Orchestrate multi-step ingestion and extraction pipelines
  • Batch Operations: Process large volumes of documents in parallel

Workspaces are stateless—all persistent data lives in dataspaces. This means you can scale workspaces up or down, create new ones, or destroy existing ones without any data loss.

Key Characteristics

Scalable Compute Clusters: Each workspace is a fully managed compute cluster (typically running on Kubernetes) that can be scaled independently based on workload:

  • Scale up for high-volume ingestion
  • Scale down during idle periods
  • Create dedicated workspaces for different teams or use cases
  • No impact on data storage when scaling

Multi-Dataspace Access: A single workspace can access multiple dataspaces, subject to user permissions. This allows you to:

  • Query across multiple dataspaces simultaneously
  • Move content between dataspaces
  • Aggregate data from different departments or projects
  • Apply consistent processing pipelines across dataspaces

Network Isolation: Workspaces run in isolated virtual private clouds (VPCs) with strict network segmentation:

  • Each workspace has its own VPC (or shares a VPC with other workspaces in the same account)
  • No direct network connectivity between different customer workspaces
  • Encrypted connections to dataspaces and external services
  • Private networking for all database and blob storage access

Model Provider Flexibility: By default, workspaces perform all AI computations within the VPC using self-hosted models. However, customers can optionally configure workspaces to use external model providers (e.g., OpenAI, Anthropic, Cohere) for specific operations:

  • Default (VPC-only): All models run within the VPC—data never leaves your environment
  • External Providers: Optionally send text to external APIs for embeddings, extractions, or queries
  • Bring Your Own Model (BYOM): Use your own API keys for external providers to control costs and usage

Creating a Workspace

Workspaces are created via the Admin API and deployed in the cloud provider's infrastructure.

Example: Create a Workspace

POST https://cloud.syncdocs.ai/api/accounts/{accountId}/workspaces
Authorization: Bearer <token>
Content-Type: application/json

{
  "name": "Production Workspace",
  "description": "Primary workspace for production applications",
  "plan": "us-east-1",
  "resources": {
    "nodeCount": 3,
    "nodeSize": "medium"
  }
}

Response:

{
  "id": "sws-12345678-abcd-1234-efgh-123456789012",
  "accountId": "acc-98765432-dcba-4321-hgfe-987654321098",
  "name": "Production Workspace",
  "description": "Primary workspace for production applications",
  "plan": "us-east-1",
  "clusterStatus": "CREATING",
  "apiUrl": "https://sws-12345678.cloud.syncdocs.ai",
  "publicDbURL": "postgres://user:pass@host:5432/workspace_sws_12345678",
  "createdAt": "2024-10-28T14:30:00Z",
  "updatedAt": "2024-10-28T14:30:00Z"
}

Parameters:

  • name (required): Human-readable name for the workspace
  • description (optional): Description of the workspace's purpose
  • plan (optional): Cloud region for deployment (e.g., us-east-1, eu-west-1)
  • resources (optional): Compute resource configuration
    • nodeCount: Number of compute nodes (default: 2)
    • nodeSize: Size of each node (small, medium, large)

Workspace Status

Workspace creation is asynchronous and typically takes 5-10 minutes. The clusterStatus field indicates the current state:

  • CREATING: Cluster infrastructure is being provisioned
  • READY: Workspace is fully operational
  • UPDATING: Configuration changes are being applied
  • ERROR: Provisioning or configuration failed

You can poll the workspace status:

GET https://cloud.syncdocs.ai/api/accounts/{accountId}/workspaces/{workspaceId}
Authorization: Bearer <token>

Using a Workspace

Once a workspace is READY, you access it via its dedicated API URL.

Workspace API URL

Each workspace has a unique subdomain-based URL:

https://{workspaceId}.cloud.syncdocs.ai

All Workspace API operations use this base URL. For example:

# Upload content to a dataspace
POST https://sws-12345678.cloud.syncdocs.ai/api/content/{dataspaceId}

# Query content with AI
POST https://sws-12345678.cloud.syncdocs.ai/api/content/{dataspaceId}/query

# List content in a dataspace
GET https://sws-12345678.cloud.syncdocs.ai/api/content/{dataspaceId}

Access Control

Workspaces enforce object-level access control. When you call a workspace API with your authentication token, the workspace:

  1. Validates your token with the control plane
  2. Determines which objects your API request is trying to access
  3. Restricts API operations to only those dataspaces

This means:

  • ✅ Users can only access dataspaces they have permissions for
  • ✅ Multiple users can share the same workspace with different access levels
  • ✅ Workspace compute resources are shared, but data access is isolated per user

Model Providers & API Keys

Workspaces offer flexible options for AI model usage.

Default: VPC-Only Compute

By default, all AI operations run within the VPC:

  • Text extraction, chunking, and embedding generation happen locally
  • No data leaves your VPC
  • No external API calls for document processing
  • Ideal for compliance, data sovereignty, and cost control

Optional: External Model Providers

You can configure a workspace to use external model providers (OpenAI, Anthropic, Cohere, etc.) for specific operations:

  • Embeddings: Use OpenAI's text-embedding-3-small for vector generation
  • Query Agents: Use GPT-4 or Claude for complex research queries
  • Data Extraction: Use advanced models for structured data extraction

Bring Your Own Model (BYOM): Customers can provide their own API keys for external providers:

Privacy Considerations: When using external providers, the workspace sends text content and any metadata to the external API. Raw files remain within the VPC.

Scaling Workspaces

Workspaces can be scaled independently of dataspaces:

Vertical Scaling (Resize Nodes):

PATCH https://cloud.syncdocs.ai/api/accounts/{accountId}/workspaces/{workspaceId}
Content-Type: application/json

{
  "resources": {
    "nodeSize": "large"
  }
}

Horizontal Scaling (Add Nodes):

PATCH https://cloud.syncdocs.ai/api/accounts/{accountId}/workspaces/{workspaceId}
Content-Type: application/json

{
  "resources": {
    "nodeCount": 5
  }
}

Use Cases for Scaling:

  • Increase capacity during bulk ingestion
  • Scale down during off-peak hours to reduce costs
  • Temporarily boost resources for complex research queries
  • Create dedicated high-performance workspaces for critical applications

Multiple Workspaces

Organizations often create multiple workspaces for different purposes:

Development vs. Production:

  • dev-workspace: Testing new workflows and ontologies
  • prod-workspace: Production applications with SLA guarantees

Department-Specific Workspaces:

  • hr-workspace: Optimized for HR department use cases
  • legal-workspace: Configured for legal team workflows

Geographic Distribution:

  • us-workspace: Deployed in US region for low-latency US access
  • eu-workspace: Deployed in EU region for GDPR compliance

Each workspace can access the same dataspaces (subject to permissions), but provides isolated compute resources.

Next Steps