Libraries
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

Libraries are lightweight, read-only knowledge sources that index external web content without storing copies. Think of them as dataspaces for content you don't manage—Sync crawls, indexes, and makes the content semantically searchable, but the original files remain at their source.

What is a Library?

A library is a collection of web pages indexed from a specified URL. Unlike dataspaces where you upload and own the content, libraries reference external content and make it queryable alongside your internal documents.

Key Characteristics:

No file storage: Sync indexes text and creates embeddings but doesn't store copies of source files
Web-accessible content: Works with any web-accessible website, file, or documentation
Always current: Can be re-indexed to stay up-to-date with source changes
Query-ready: Indexed content is immediately available for AI queries and research agents

Libraries are perfect for incorporating external knowledge into your AI workflows without duplicating content or managing file uploads.

Creating a Library

Libraries are created via the Admin API by specifying a root URL and optional filter pattern.

Example: Index Product Documentation

POST https://cloud.syncdocs.ai/api/accounts/{accountId}/libraries
Authorization: Bearer <token>
Content-Type: application/json

{
  "name": "Product Documentation",
  "rootUrl": "https://docs.example.com/",
  "urlFilter": "https://docs\\.example\\.com/(api|guides)/.*"
}

Response:

{
  "id": "lib-550e8400-uuid",
  "accountId": "scd-abc12345",
  "name": "Product Documentation",
  "rootUrl": "https://docs.example.com/",
  "urlFilter": "https://docs\\.example\\.com/(api|guides)/.*",
  "createdAt": "2024-10-28T10:00:00Z"
}

Parameters:

name (required): Human-readable library name
rootUrl (required): Starting point for crawling
urlFilter (optional): Regex pattern to limit which pages are indexed

URL Filtering

The urlFilter parameter uses regex to control which pages get indexed:

// Index everything under the root
{
  "rootUrl": "https://example.com/docs/",
  "urlFilter": null
}

// Index only specific sections
{
  "rootUrl": "https://example.com/",
  "urlFilter": "https://example\\.com/(docs|api|guides)/.*"
}

// Exclude certain paths
{
  "rootUrl": "https://example.com/docs/",
  "urlFilter": "https://example\\.com/docs/(?!internal/).*"
}

Binding Libraries to Workspaces

Libraries must be bound to workspaces before they can be used in queries.

POST https://cloud.syncdocs.ai/api/accounts/{accountId}/workspaces/{workspaceId}/libraries
Authorization: Bearer <token>
Content-Type: application/json

{
  "libraryId": "lib-550e8400-uuid"
}

Once bound, the workspace will index the library content in the background. Indexing typically completes within minutes, depending on the size of the library.

Using Libraries in Queries

Libraries are included in queries via the context.libraries parameter.

Example: Query with External Documentation

POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/query
Authorization: Bearer <token>
Content-Type: application/json

{
  "query": "How do I configure SSL certificates for production deployment?",
  "context": {
    "libraries": ["lib-product-docs-uuid"],
    "contentFilters": {
      "categoryId": "cat-deployment-guides-uuid"
    }
  }
}

How It Works:

Sync performs semantic search across both your internal content (filtered by contentFilters) and the specified libraries
The AI receives relevant text chunks from both sources
The response includes citations from both internal documents and library pages

Response:

{
  "query": "How do I configure SSL certificates...",
  "response": "To configure SSL certificates for production deployment:\n\n1. Generate a certificate signing request (CSR) [Product Documentation: SSL Setup]\n2. Upload the certificate to your load balancer [Deployment Guide: Load Balancer Config]\n3. Configure your application to use HTTPS endpoints [Product Documentation: HTTPS Configuration]\n\nMake sure to test the certificate chain before deploying to production [Deployment Guide: Pre-deployment Checklist].",
  "analyzedDocumentCount": 12,
  "citedContent": [
    {
      "contentId": "550e8400-...",
      "fileName": "deployment-guide.pdf"
    }
  ],
  "citedLibraryPages": [
    {
      "id": "lib-page-12345-uuid",
      "url": "https://docs.example.com/ssl-setup",
      "title": "SSL Setup Guide"
    },
    {
      "id": "lib-page-67890-uuid",
      "url": "https://docs.example.com/https-config",
      "title": "HTTPS Configuration"
    }
  ]
}

Common Use Cases

Regulatory & Compliance Knowledge: Index government regulations, industry standards, or legal codes to answer compliance questions about your internal documents.

{
  "name": "HIPAA Regulations",
  "rootUrl": "https://hhs.gov/hipaa/",
  "urlFilter": "https://hhs\\.gov/hipaa/(for-professionals|index\\.html).*"
}

Product Documentation: Include vendor documentation, API references, or technical guides to help answer questions about integrating with external systems.

{
  "name": "Stripe API Documentation",
  "rootUrl": "https://stripe.com/docs/",
  "urlFilter": "https://stripe\\.com/docs/api/.*"
}

Industry Research: Index research publications, whitepapers, or market reports to provide broader context for internal analysis.

{
  "name": "Gartner Reports",
  "rootUrl": "https://gartner.com/reports/",
  "urlFilter": "https://gartner\\.com/reports/(technology|security)/.*"
}

Knowledge Bases: Incorporate internal wikis, help centers, or documentation sites alongside uploaded documents.

{
  "name": "Company Wiki",
  "rootUrl": "https://wiki.company.com/",
  "urlFilter": null
}

Re-indexing Libraries

Libraries can be re-indexed to stay current with source changes:

POST https://cloud.syncdocs.ai/api/accounts/{accountId}/libraries/{libraryId}/reindex
Authorization: Bearer <token>

This triggers a fresh crawl of the library's URLs. Existing chunks are replaced with updated content.

Libraries vs. Dataspaces

Feature	Dataspaces	Libraries
Content Ownership	You upload and own files	References external web content
Storage	Full file storage in blob storage	Index data only (no file copies)
Metadata	Rich, arbitrary metadata	Basic page metadata (title, URL)
Ingestion	Explicit per-document ingestion	Automatic crawling and indexing
Updates	Manual re-upload	Re-index on demand
Access Control	Per-object permissions	Workspace-level bindings
Use Case	Internal content repository	External knowledge sources

Internal or Authenticated Content

If you need to index content behind authentication (e.g., internal wikis, intranets, or systems requiring login), Sync can help. Our team can work with you to:

Configure authentication for library crawling
Set up VPN or private network access
Index content from on-premises systems
Implement custom crawling logic for specialized systems

Contact us at support@syncdocs.ai to discuss your specific requirements.

Next Steps

Understand Queries - Use libraries in AI queries
Explore Agents - Configure default libraries for agents
Learn about Dataspaces - Compare with full content repositories
API Reference - Complete library API documentation# Libraries

LibrariesCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code