Skip to content
Last updated

Libraries

Libraries are lightweight, read-only knowledge sources that index external web content without storing copies. Think of them as dataspaces for content you don't manage—Sync crawls, indexes, and makes the content semantically searchable, but the original files remain at their source.

What is a Library?

A library is a collection of web pages indexed from a specified URL. Unlike dataspaces where you upload and own the content, libraries reference external content and make it queryable alongside your internal documents.

Key Characteristics:

  • No file storage: Sync indexes text and creates embeddings but doesn't store copies of source files
  • Web-accessible content: Works with any web-accessible website, file, or documentation
  • Always current: Can be re-indexed to stay up-to-date with source changes
  • Query-ready: Indexed content is immediately available for AI queries and research agents

Libraries are perfect for incorporating external knowledge into your AI workflows without duplicating content or managing file uploads.

Creating a Library

Libraries are created via the Admin API by specifying a root URL and optional filter pattern.

Example: Index Product Documentation

POST https://cloud.syncdocs.ai/api/accounts/{accountId}/libraries
Authorization: Bearer <token>
Content-Type: application/json

{
  "name": "Product Documentation",
  "rootUrl": "https://docs.example.com/",
  "urlFilter": "https://docs\\.example\\.com/(api|guides)/.*"
}

Response:

{
  "id": "lib-550e8400-uuid",
  "accountId": "scd-abc12345",
  "name": "Product Documentation",
  "rootUrl": "https://docs.example.com/",
  "urlFilter": "https://docs\\.example\\.com/(api|guides)/.*",
  "createdAt": "2024-10-28T10:00:00Z"
}

Parameters:

  • name (required): Human-readable library name
  • rootUrl (required): Starting point for crawling
  • urlFilter (optional): Regex pattern to limit which pages are indexed

URL Filtering

The urlFilter parameter uses regex to control which pages get indexed:

// Index everything under the root
{
  "rootUrl": "https://example.com/docs/",
  "urlFilter": null
}

// Index only specific sections
{
  "rootUrl": "https://example.com/",
  "urlFilter": "https://example\\.com/(docs|api|guides)/.*"
}

// Exclude certain paths
{
  "rootUrl": "https://example.com/docs/",
  "urlFilter": "https://example\\.com/docs/(?!internal/).*"
}

Binding Libraries to Workspaces

Libraries must be bound to workspaces before they can be used in queries.

POST https://cloud.syncdocs.ai/api/accounts/{accountId}/workspaces/{workspaceId}/libraries
Authorization: Bearer <token>
Content-Type: application/json

{
  "libraryId": "lib-550e8400-uuid"
}

Once bound, the workspace will index the library content in the background. Indexing typically completes within minutes, depending on the size of the library.

Using Libraries in Queries

Libraries are included in queries via the context.libraries parameter.

Example: Query with External Documentation

POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/query
Authorization: Bearer <token>
Content-Type: application/json

{
  "query": "How do I configure SSL certificates for production deployment?",
  "context": {
    "libraries": ["lib-product-docs-uuid"],
    "contentFilters": {
      "categoryId": "cat-deployment-guides-uuid"
    }
  }
}

How It Works:

  1. Sync performs semantic search across both your internal content (filtered by contentFilters) and the specified libraries
  2. The AI receives relevant text chunks from both sources
  3. The response includes citations from both internal documents and library pages

Response:

{
  "query": "How do I configure SSL certificates...",
  "response": "To configure SSL certificates for production deployment:\n\n1. Generate a certificate signing request (CSR) [Product Documentation: SSL Setup]\n2. Upload the certificate to your load balancer [Deployment Guide: Load Balancer Config]\n3. Configure your application to use HTTPS endpoints [Product Documentation: HTTPS Configuration]\n\nMake sure to test the certificate chain before deploying to production [Deployment Guide: Pre-deployment Checklist].",
  "analyzedDocumentCount": 12,
  "citedContent": [
    {
      "contentId": "550e8400-...",
      "fileName": "deployment-guide.pdf"
    }
  ],
  "citedLibraryPages": [
    {
      "id": "lib-page-12345-uuid",
      "url": "https://docs.example.com/ssl-setup",
      "title": "SSL Setup Guide"
    },
    {
      "id": "lib-page-67890-uuid",
      "url": "https://docs.example.com/https-config",
      "title": "HTTPS Configuration"
    }
  ]
}

Common Use Cases

Regulatory & Compliance Knowledge: Index government regulations, industry standards, or legal codes to answer compliance questions about your internal documents.

{
  "name": "HIPAA Regulations",
  "rootUrl": "https://hhs.gov/hipaa/",
  "urlFilter": "https://hhs\\.gov/hipaa/(for-professionals|index\\.html).*"
}

Product Documentation: Include vendor documentation, API references, or technical guides to help answer questions about integrating with external systems.

{
  "name": "Stripe API Documentation",
  "rootUrl": "https://stripe.com/docs/",
  "urlFilter": "https://stripe\\.com/docs/api/.*"
}

Industry Research: Index research publications, whitepapers, or market reports to provide broader context for internal analysis.

{
  "name": "Gartner Reports",
  "rootUrl": "https://gartner.com/reports/",
  "urlFilter": "https://gartner\\.com/reports/(technology|security)/.*"
}

Knowledge Bases: Incorporate internal wikis, help centers, or documentation sites alongside uploaded documents.

{
  "name": "Company Wiki",
  "rootUrl": "https://wiki.company.com/",
  "urlFilter": null
}

Re-indexing Libraries

Libraries can be re-indexed to stay current with source changes:

POST https://cloud.syncdocs.ai/api/accounts/{accountId}/libraries/{libraryId}/reindex
Authorization: Bearer <token>

This triggers a fresh crawl of the library's URLs. Existing chunks are replaced with updated content.

Libraries vs. Dataspaces

FeatureDataspacesLibraries
Content OwnershipYou upload and own filesReferences external web content
StorageFull file storage in blob storageIndex data only (no file copies)
MetadataRich, arbitrary metadataBasic page metadata (title, URL)
IngestionExplicit per-document ingestionAutomatic crawling and indexing
UpdatesManual re-uploadRe-index on demand
Access ControlPer-object permissionsWorkspace-level bindings
Use CaseInternal content repositoryExternal knowledge sources

Internal or Authenticated Content

If you need to index content behind authentication (e.g., internal wikis, intranets, or systems requiring login), Sync can help. Our team can work with you to:

  • Configure authentication for library crawling
  • Set up VPN or private network access
  • Index content from on-premises systems
  • Implement custom crawling logic for specialized systems

Contact us at support@syncdocs.ai to discuss your specific requirements.

Next Steps