# Libraries **Libraries** are lightweight, read-only knowledge sources that index external web content without storing copies. Think of them as dataspaces for content you don't manage—Sync crawls, indexes, and makes the content semantically searchable, but the original files remain at their source. ## What is a Library? A library is a collection of web pages indexed from a specified URL. Unlike dataspaces where you upload and own the content, libraries reference external content and make it queryable alongside your internal documents. **Key Characteristics**: - **No file storage**: Sync indexes text and creates embeddings but doesn't store copies of source files - **Web-accessible content**: Works with any web-accessible website, file, or documentation - **Always current**: Can be re-indexed to stay up-to-date with source changes - **Query-ready**: Indexed content is immediately available for AI queries and research agents Libraries are perfect for incorporating external knowledge into your AI workflows without duplicating content or managing file uploads. ## Creating a Library Libraries are created via the Admin API by specifying a root URL and optional filter pattern. ### Example: Index Product Documentation ```bash POST https://cloud.syncdocs.ai/api/accounts/{accountId}/libraries Authorization: Bearer Content-Type: application/json { "name": "Product Documentation", "rootUrl": "https://docs.example.com/", "urlFilter": "https://docs\\.example\\.com/(api|guides)/.*" } ``` **Response**: ```json { "id": "lib-550e8400-uuid", "accountId": "scd-abc12345", "name": "Product Documentation", "rootUrl": "https://docs.example.com/", "urlFilter": "https://docs\\.example\\.com/(api|guides)/.*", "createdAt": "2024-10-28T10:00:00Z" } ``` **Parameters**: - `name` (required): Human-readable library name - `rootUrl` (required): Starting point for crawling - `urlFilter` (optional): Regex pattern to limit which pages are indexed ### URL Filtering The `urlFilter` parameter uses regex to control which pages get indexed: ```json // Index everything under the root { "rootUrl": "https://example.com/docs/", "urlFilter": null } // Index only specific sections { "rootUrl": "https://example.com/", "urlFilter": "https://example\\.com/(docs|api|guides)/.*" } // Exclude certain paths { "rootUrl": "https://example.com/docs/", "urlFilter": "https://example\\.com/docs/(?!internal/).*" } ``` ## Binding Libraries to Workspaces Libraries must be bound to workspaces before they can be used in queries. ```bash POST https://cloud.syncdocs.ai/api/accounts/{accountId}/workspaces/{workspaceId}/libraries Authorization: Bearer Content-Type: application/json { "libraryId": "lib-550e8400-uuid" } ``` Once bound, the workspace will index the library content in the background. Indexing typically completes within minutes, depending on the size of the library. ## Using Libraries in Queries Libraries are included in queries via the `context.libraries` parameter. ### Example: Query with External Documentation ```bash POST https://sws-{workspaceId}.cloud.syncdocs.ai/api/content/{dataspaceId}/query Authorization: Bearer Content-Type: application/json { "query": "How do I configure SSL certificates for production deployment?", "context": { "libraries": ["lib-product-docs-uuid"], "contentFilters": { "categoryId": "cat-deployment-guides-uuid" } } } ``` **How It Works**: 1. Sync performs semantic search across both your internal content (filtered by `contentFilters`) and the specified libraries 2. The AI receives relevant text chunks from both sources 3. The response includes citations from both internal documents and library pages **Response**: ```json { "query": "How do I configure SSL certificates...", "response": "To configure SSL certificates for production deployment:\n\n1. Generate a certificate signing request (CSR) [Product Documentation: SSL Setup]\n2. Upload the certificate to your load balancer [Deployment Guide: Load Balancer Config]\n3. Configure your application to use HTTPS endpoints [Product Documentation: HTTPS Configuration]\n\nMake sure to test the certificate chain before deploying to production [Deployment Guide: Pre-deployment Checklist].", "analyzedDocumentCount": 12, "citedContent": [ { "contentId": "550e8400-...", "fileName": "deployment-guide.pdf" } ], "citedLibraryPages": [ { "id": "lib-page-12345-uuid", "url": "https://docs.example.com/ssl-setup", "title": "SSL Setup Guide" }, { "id": "lib-page-67890-uuid", "url": "https://docs.example.com/https-config", "title": "HTTPS Configuration" } ] } ``` ## Common Use Cases **Regulatory & Compliance Knowledge**: Index government regulations, industry standards, or legal codes to answer compliance questions about your internal documents. ```json { "name": "HIPAA Regulations", "rootUrl": "https://hhs.gov/hipaa/", "urlFilter": "https://hhs\\.gov/hipaa/(for-professionals|index\\.html).*" } ``` **Product Documentation**: Include vendor documentation, API references, or technical guides to help answer questions about integrating with external systems. ```json { "name": "Stripe API Documentation", "rootUrl": "https://stripe.com/docs/", "urlFilter": "https://stripe\\.com/docs/api/.*" } ``` **Industry Research**: Index research publications, whitepapers, or market reports to provide broader context for internal analysis. ```json { "name": "Gartner Reports", "rootUrl": "https://gartner.com/reports/", "urlFilter": "https://gartner\\.com/reports/(technology|security)/.*" } ``` **Knowledge Bases**: Incorporate internal wikis, help centers, or documentation sites alongside uploaded documents. ```json { "name": "Company Wiki", "rootUrl": "https://wiki.company.com/", "urlFilter": null } ``` ## Re-indexing Libraries Libraries can be re-indexed to stay current with source changes: ```bash POST https://cloud.syncdocs.ai/api/accounts/{accountId}/libraries/{libraryId}/reindex Authorization: Bearer ``` This triggers a fresh crawl of the library's URLs. Existing chunks are replaced with updated content. ## Libraries vs. Dataspaces | Feature | Dataspaces | Libraries | | --- | --- | --- | | **Content Ownership** | You upload and own files | References external web content | | **Storage** | Full file storage in blob storage | Index data only (no file copies) | | **Metadata** | Rich, arbitrary metadata | Basic page metadata (title, URL) | | **Ingestion** | Explicit per-document ingestion | Automatic crawling and indexing | | **Updates** | Manual re-upload | Re-index on demand | | **Access Control** | Per-object permissions | Workspace-level bindings | | **Use Case** | Internal content repository | External knowledge sources | ## Internal or Authenticated Content If you need to index content behind authentication (e.g., internal wikis, intranets, or systems requiring login), Sync can help. Our team can work with you to: - Configure authentication for library crawling - Set up VPN or private network access - Index content from on-premises systems - Implement custom crawling logic for specialized systems **Contact us** at [support@syncdocs.ai](mailto:support@syncdocs.ai) to discuss your specific requirements. ## Next Steps - [Understand Queries](/concepts/queries) - Use libraries in AI queries - [Explore Agents](/concepts/agents) - Configure default libraries for agents - [Learn about Dataspaces](/concepts/dataspaces) - Compare with full content repositories - [API Reference](/api) - Complete library API documentation# Libraries