Build a Research Agent
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code

This guide shows you how to build an AI-powered legal research agent for a fictitious San Francisco law firm. You'll create an agent that can answer questions by searching across private client case files AND public legal reference libraries—demonstrating how Sync's unified platform enables you to combine private and public data with built-in governance, privacy controls, and comprehensive audit trails.

What You'll Build

A "SF Legal Research Assistant" that paralegals and attorneys can query about client cases while automatically pulling relevant California state statutes and San Francisco municipal regulations. The agent will:

Search across private client documents with metadata-based filters
Incorporate authoritative legal reference libraries
Maintain conversation history automatically
Provide full citation tracking for compliance and audit trails

What You'll Learn

How to create Libraries for external reference content (laws, regulations, public documentation)
How to create an Agent with custom instructions and default context
How to link libraries to agents for expanded search scope
How to query agents with metadata filters to scope queries to specific clients or case types
How automatic conversation history enables multi-turn research sessions
How to access query logs for compliance and audit purposes
Why this architecture uniquely enables privacy, governance, and ease of use

Prerequisites

Before starting, make sure you have:

Completed the Account Setup Guide
An active workspace (e.g., sws-x9p3q7r5)
A dataspace (e.g., sds-a1b2c3d4) containing client case documents
Client documents already uploaded and categorized with metadata including:
- clientId (e.g., "CLIENT-2024-789")
- caseNumber (e.g., "CASE-2024-1523")
- caseType (e.g., "Eviction Defense", "Tenant Rights", "Housing Dispute")
- jurisdiction (e.g., "San Francisco", "California")
Your authentication token ready
Your Account ID (format: scd-k2j8n4m1)

Note that with an ontology configured, that metadata would have been added automatically by Sync.

Understanding the Core Concepts

Before we dive into building, let's briefly understand the key concepts:

Libraries

A Library allows you to ingest and index external content (websites, documentation, regulations) that should be available to agents as read-only reference material. Unlike dataspaces which store YOUR private documents, libraries store PUBLIC or LICENSED content that provides context for queries.

Why Use Libraries?

Separation of concerns: Keep client data (dataspace) separate from reference materials (libraries)
Reusability: One library can be used by multiple agents across different dataspaces
Governance: Libraries have different access controls than dataspaces—they're not tied to specific clients or cases

Agents

An Agent is a configured AI assistant with:

Instructions: Specific behavior, tone, domain expertise, and guidelines
Default context: Pre-configured libraries and search settings
Permissions: Implicitly defined by the dataspaces to which the querying user has access

Why Use Agents?

Customization: Different agents for different use cases (research, classification, extraction)
Access control: Agents act as a permission and behavior layer—users query agents, agents query data
Consistency: Same agent instructions across all queries ensure predictable, high-quality behavior
Audit trails: All queries are logged with agent ID for governance and compliance
Domain expertise: Encode specialized knowledge into agent instructions (legal, medical, financial, etc.)

Automatic Conversation History

Sync automatically manages conversation history for multi-turn dialogues:

Each new conversation gets a unique conversationId
Pass the conversationId to continue a conversation thread
Sync intelligently manages token limits and context summarization
Full conversation threads are stored and auditable

Why This Matters:

User experience: Natural, ChatGPT-like conversations without client-side state management
Efficiency: No need to manually pass chat history in your application
Governance: Full conversation threads are logged and auditable for compliance
Token optimization: Sync handles context window management automatically

Step 1: Create Your First Library (CA State Law)

Let's create a library that indexes California state law for your legal research agent.

Create the Library

POST https://cloud.syncdocs.ai/api/accounts/scd-k2j8n4m1/libraries
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "name": "California State Law",
  "rootUrl": "https://leginfo.legislature.ca.gov/",
  "urlFilter": "https://leginfo\\.legislature\\.ca\\.gov/(faces/codes|statutes)/.*"
}

Request Parameters:

name (required): Descriptive name for the library
rootUrl (required): Base URL to crawl and index
urlFilter (optional): Regular expression to limit which URLs are indexed (prevents crawling the entire domain)

Response:

{
  "id": "c7f85f64-9821-4a62-b8fc-1c963f66afa6",
  "name": "California State Law",
  "rootUrl": "https://leginfo.legislature.ca.gov/",
  "urlFilter": "https://leginfo\\.legislature\\.ca\\.gov/(faces/codes|statutes)/.*",
  "accountId": "scd-k2j8n4m1",
  "createdAt": "2025-01-25T10:00:00Z",
  "lastUpdatedAt": null
}

What Happens Next:

Sync begins crawling the specified URL automatically
Web pages are downloaded, text is extracted, and embeddings are generated
Content is indexed for semantic search
This process runs asynchronously in the background (can take minutes to hours depending on site size)

Save the library id (c7f85f64-9821-4a62-b8fc-1c963f66afa6) - you'll need it when configuring the agent.

Step 2: Create the SF Municipal Code Library

Now create a second library for San Francisco-specific regulations:

POST https://cloud.syncdocs.ai/api/accounts/scd-k2j8n4m1/libraries
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "name": "San Francisco Municipal Code",
  "rootUrl": "https://codelibrary.amlegal.com/codes/san_francisco/",
  "urlFilter": "https://codelibrary\\.amlegal\\.com/codes/san_francisco/.*"
}

Response:

{
  "id": "d8a96g75-1932-5b73-c9gd-2d074g77bgb7",
  "name": "San Francisco Municipal Code",
  "rootUrl": "https://codelibrary.amlegal.com/codes/san_francisco/",
  "urlFilter": "https://codelibrary\\.amlegal\\.com/codes/san_francisco/.*",
  "accountId": "scd-k2j8n4m1",
  "createdAt": "2025-01-25T10:05:00Z",
  "lastUpdatedAt": null
}

You now have two libraries that will provide legal reference context to your research agent:

✅ California State Law (c7f85f64-9821-4a62-b8fc-1c963f66afa6)
✅ San Francisco Municipal Code (d8a96g75-1932-5b73-c9gd-2d074g77bgb7)

Step 3: Create the Research Agent

Now let's create an agent that's specifically designed for legal research, with instructions that encode best practices and link it to both libraries.

POST https://cloud.syncdocs.ai/api/accounts/scd-k2j8n4m1/agents
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "name": "SF Legal Research Assistant",
  "description": "AI assistant specializing in California and San Francisco law for client case research",
  "instructions": "You are an expert legal research assistant specializing in California state law and San Francisco municipal law. Your role is to help attorneys and paralegals research client cases.\n\nGuidelines:\n1. Always cite specific statutes, codes, or ordinances with section numbers\n2. Clearly distinguish between information from client case files vs. legal reference sources\n3. When answering questions about specific clients, prioritize documents from that client's files\n4. Provide practical guidance relevant to San Francisco jurisdiction\n5. If legal precedent or statutory interpretation is unclear, explicitly state the ambiguity\n6. Recommend consulting an attorney for case-specific legal advice when appropriate\n7. Use formal, professional language suitable for legal documentation\n\nYou have access to:\n- Client case files (contracts, correspondence, filings, evidence)\n- California state statutes and codes\n- San Francisco municipal ordinances and regulations",
  "defaultContext": {
    "libraries": [
      "c7f85f64-9821-4a62-b8fc-1c963f66afa6",
      "d8a96g75-1932-5b73-c9gd-2d074g77bgb7"
    ],
    "includeWebSearchResults": false
  }
}

Request Parameters:

name (required): Display name for the agent
description (required): Short description of the agent's purpose
instructions (required): Detailed system prompt that defines the agent's behavior, tone, expertise, and guidelines
defaultContext.libraries (optional): Array of library IDs to include in every query by default
defaultContext.includeWebSearchResults (optional): Whether to augment queries with web search results (default: false)

Response:

{
  "agentId": "550e8400-e29b-41d4-a716-446655440000",
  "accountId": "scd-k2j8n4m1",
  "name": "SF Legal Research Assistant",
  "description": "AI assistant specializing in California and San Francisco law for client case research",
  "instructions": "You are an expert legal research assistant specializing in California state law and San Francisco municipal law...",
  "defaultContext": {
    "libraries": [
      "c7f85f64-9821-4a62-b8fc-1c963f66afa6",
      "d8a96g75-1932-5b73-c9gd-2d074g77bgb7"
    ],
    "includeWebSearchResults": false
  },
  "createdBy": "123e4567-e89b-12d3-a456-426614174001",
  "createdAt": "2025-01-25T10:15:00Z",
  "lastUpdatedAt": null
}

What You've Configured:

✅ An agent with legal expertise encoded in its instructions
✅ Default access to both CA state law and SF municipal code libraries
✅ Professional tone and behavior guidelines for legal research
✅ Clear citation and source attribution requirements

Save the agentId (550e8400-e29b-41d4-a716-446655440000) - you'll use it for all queries.

Step 4: Query Your Agent (Basic Research)

Now let's query the agent with a general legal research question. This will search across BOTH client case files (in your dataspace) AND legal reference libraries.

POST https://sws-x9p3q7r5.syncdocs.ai/api/content/sds-a1b2c3d4/query
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "query": "What are the tenant notification requirements for eviction in San Francisco?",
  "agentId": "550e8400-e29b-41d4-a716-446655440000"
}

Request Parameters:

query (required): The question to ask
agentId (optional but recommended): The agent to use for this query (provides instructions and default library context)
context (optional): Additional context configuration (we'll see this in the next step)

What Happens Under the Hood:

Sync loads the agent's instructions and default context (libraries)
The agent searches across:
- Your dataspace (sds-a1b2c3d4) for client case files
- California State Law library for relevant statutes
- SF Municipal Code library for local ordinances
Vector embeddings are used to find semantically relevant content
The AI synthesizes an answer using the agent's instructions as guidance
Citations are extracted and full content objects are retrieved
A new conversation is automatically created with a unique conversationId

Response:

{
  "query": "What are the tenant notification requirements for eviction in San Francisco?",
  "response": "In San Francisco, tenant notification requirements for eviction are governed by both California state law and San Francisco's Rent Ordinance.\n\n**California State Law (Civil Code §1946.1):**\nFor month-to-month tenancies, landlords must provide:\n- 30-day notice if the tenant has resided in the unit for less than one year\n- 60-day notice if the tenant has resided in the unit for one year or more\n\n**San Francisco Rent Ordinance (§37.9):**\nSan Francisco provides additional tenant protections. Landlords can only evict tenants for \"just cause\" as defined in the ordinance. Notice requirements depend on the grounds for eviction:\n- For at-fault evictions (e.g., non-payment of rent): 3-day notice\n- For no-fault evictions (e.g., owner move-in): 60-day notice (120 days for elderly or disabled tenants)\n\nAll eviction notices must include specific language informing tenants of their rights under San Francisco law【0】【1】.",
  "analyzedDocumentCount": 47,
  "citedContent": [],
  "webSearchResults": [],
  "citedLibraryPages": [
    {
      "id": "9b7c5d3a-4f2e-4a8b-9c1d-2e3f4a5b6c7d",
      "libraryId": "c7f85f64-9821-4a62-b8fc-1c963f66afa6",
      "url": "https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV&sectionNum=1946.1",
      "title": "California Civil Code § 1946.1 - Notice to Terminate Tenancy",
      "domain": "leginfo.legislature.ca.gov",
      "contentType": "text/html",
      "scrapedAt": "2025-01-20T08:30:00Z",
      "relevanceScore": 0.94
    },
    {
      "id": "8a6b4c2d-3e1f-5b9a-8d0c-1e2f3a4b5c6e",
      "libraryId": "d8a96g75-1932-5b73-c9gd-2d074g77bgb7",
      "url": "https://codelibrary.amlegal.com/codes/san_francisco/latest/sf_admin/0-0-0-17656",
      "title": "San Francisco Administrative Code § 37.9 - Just Cause for Eviction",
      "domain": "codelibrary.amlegal.com",
      "contentType": "text/html",
      "scrapedAt": "2025-01-22T14:15:00Z",
      "relevanceScore": 0.91
    }
  ]
}

Understanding the Response:

response: AI-generated answer following the agent's instructions (formal legal tone, clear citations)
analyzedDocumentCount: 47 documents were available for analysis (client files + library pages)
citedContent: Empty in this case (no client case files were cited, only library pages)
citedLibraryPages: Two library pages were cited - one from CA law, one from SF municipal code
webSearchResults: Empty because includeWebSearchResults is false for this agent

Citation Format:

The citations 【0】 and 【1】 in the response reference indices in the citedLibraryPages array. This allows your UI to:

Display clickable citations
Show source documents in a sidebar
Provide users with direct links to authoritative sources

Why This Matters:

Privacy: Client case files stayed in your dataspace and were searchable, but none were relevant for this general question
Comprehensive research: Agent automatically pulled from both state and local law
Audit trail: Full record of which sources were accessed
Professional output: Response follows legal research conventions (citations, clear structure)

Step 5: Query with Metadata Filters (Client-Specific Research)

Now let's ask a question scoped to a SPECIFIC client using metadata filters. This demonstrates Sync's powerful ability to dynamically limit search scope while still accessing library content.

POST https://sws-x9p3q7r5.syncdocs.ai/api/content/sds-a1b2c3d4/query
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "query": "Based on our case files, what is our strongest defense strategy for the eviction case?",
  "agentId": "550e8400-e29b-41d4-a716-446655440000",
  "context": {
    "contentFilters": {
      "clientId": "CLIENT-2024-789",
      "caseType": "Eviction Defense"
    }
  }
}

What Changed:

The context.contentFilters parameter tells Sync to ONLY search documents in the dataspace that match:

clientId = CLIENT-2024-789
caseType = Eviction Defense

Libraries are STILL searched (SF/CA law), but dataspace search is now narrowed to just this client's files.

Why Use Metadata Filters:

Privacy by design: Ensure queries only access relevant client data
Compliance: Implement data isolation rules (attorney-client privilege, ethical walls)
Performance: Narrower search scope = faster, more relevant results
Flexibility: Same agent, different filters = different effective behaviors
Multi-tenancy: Support multiple clients/cases with one agent and one dataspace

Response:

{
  "query": "Based on our case files, what is our strongest defense strategy for the eviction case?",
  "response": "Based on the case files for CLIENT-2024-789, I recommend a defense strategy focused on procedural defects in the eviction notice:\n\n**Primary Defense: Insufficient Notice Period**\n\nYour client received a 30-day notice to vacate dated December 15, 2024【0】. However, tenant occupancy records show the client has resided at the property since March 2022【1】, which exceeds one year of residency.\n\nUnder California Civil Code §1946.1, tenants who have resided in a unit for one year or more must receive 60-day notice, not 30-day notice【2】. This procedural defect makes the eviction notice invalid.\n\n**Secondary Defense: Lack of Just Cause**\n\nThe landlord's stated reason for eviction is \"breach of lease\" related to unauthorized occupants【0】. However, correspondence from July 2024 shows the landlord was informed of the additional occupant (client's elderly parent) and did not object at that time【3】. San Francisco Rent Ordinance §37.9 requires clear just cause for eviction【4】. The landlord's delayed enforcement may undermine the just cause requirement.\n\n**Recommendation:**\n\nFile a motion to quash the eviction notice based on insufficient notice period. This is your strongest technical defense and has clear statutory support.",
  "analyzedDocumentCount": 23,
  "citedContent": [
    {
      "contentId": "7f3e5b89-a1c2-4d3e-b5f6-8a9b0c1d2e3f",
      "dataspaceId": "sds-a1b2c3d4",
      "fileName": "CLIENT-2024-789_Eviction_Notice.pdf",
      "fileFormat": "application/pdf",
      "categoryId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
      "metadata": {
        "clientId": "CLIENT-2024-789",
        "caseNumber": "CASE-2024-1523",
        "caseType": "Eviction Defense",
        "documentType": "Eviction Notice",
        "documentDate": "2024-12-15"
      },
      "createdAt": "2024-12-16T09:15:00Z",
      "updatedAt": "2024-12-16T09:15:00Z"
    },
    ...
    }
  ],
  "webSearchResults": [],
  "citedLibraryPages": [
    {
      "id": "9b7c5d3a-4f2e-4a8b-9c1d-2e3f4a5b6c7d",
      "libraryId": "c7f85f64-9821-4a62-b8fc-1c963f66afa6",
      "url": "https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV&sectionNum=1946.1",
      "title": "California Civil Code § 1946.1 - Notice to Terminate Tenancy",
      "domain": "leginfo.legislature.ca.gov",
      "contentType": "text/html",
      "scrapedAt": "2025-01-20T08:30:00Z",
      "relevanceScore": 0.96
    },
    {
      "id": "8a6b4c2d-3e1f-5b9a-8d0c-1e2f3a4b5c6e",
      "libraryId": "d8a96g75-1932-5b73-c9gd-2d074g77bgb7",
      "url": "https://codelibrary.amlegal.com/codes/san_francisco/latest/sf_admin/0-0-0-17656",
      "title": "San Francisco Administrative Code § 37.9 - Just Cause for Eviction",
      "domain": "codelibrary.amlegal.com",
      "contentType": "text/html",
      "scrapedAt": "2025-01-22T14:15:00Z",
      "relevanceScore": 0.89
    }
  ]
}

Understanding the Filtered Response:

analyzedDocumentCount: Only 23 documents analyzed (down from 47) because we filtered to this specific client
citedContent: Now populated with 3 client documents - all matching our metadata filters
citedLibraryPages: Still includes relevant legal references from libraries
Agent behavior: The response now explicitly references client-specific facts ("Your client received...")

The Power of Metadata Filtering:

This query demonstrates Sync's unique value proposition:

Privacy Protection: Only documents for CLIENT-2024-789 were searched
Contextual Awareness: Agent combined client facts with legal standards
Governance: Full audit trail of which client files were accessed
Unified Platform: Single query interface for structured metadata filters + semantic search + library context

Without metadata filters, the agent might have accidentally mixed facts from different clients' cases—a serious ethical and legal violation. With filters, you get precision and compliance.

Step 6: Multi-Turn Conversation (Automatic History)

Sync automatically manages conversation history. Let's ask a follow-up question without repeating context.

When you receive a query response, Sync automatically creates a conversation. To continue the conversation, you would typically need to track the conversationId from your first query. For this example, let's assume the previous query created a conversation with ID conv-a1b2c3d4-e5f6-7890-abcd-ef1234567890.

POST https://sws-x9p3q7r5.syncdocs.ai/api/content/sds-a1b2c3d4/query
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "query": "What evidence do we have to support the secondary defense?",
  "agentId": "550e8400-e29b-41d4-a716-446655440000",
  "conversationId": "conv-a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "context": {
    "contentFilters": {
      "clientId": "CLIENT-2024-789",
      "caseType": "Eviction Defense"
    }
  }
}

What's Different:

Added conversationId from the previous query
The question references "the secondary defense" without re-stating it
Same metadata filters maintained for consistency

What Happens:

Sync retrieves the conversation history
The agent understands "the secondary defense" refers to the "Lack of Just Cause" argument from the previous response
Context from the previous answer informs the new response
The conversation continues naturally

Response:

{
  "query": "What evidence do we have to support the secondary defense?",
  "response": "To support the secondary defense (lack of just cause due to delayed enforcement), we have strong documentary evidence:\n\n**Email Correspondence from July 15, 2024【0】:**\nYour client sent an email to the landlord notifying them that her elderly mother would be moving in temporarily for medical care. The landlord responded: \"Thanks for letting me know. I hope your mother recovers quickly.\"\n\nThis correspondence demonstrates:\n1. The landlord was explicitly informed of the additional occupant\n2. The landlord did not object or cite any lease violation at that time\n3. The landlord's acknowledgment could be construed as implicit consent\n\n**Current Eviction Notice (December 15, 2024)【1】:**\nCites \"unauthorized occupants\" as the breach of lease, but this occurred 5 months AFTER the landlord was informed and did not object.\n\n**Legal Analysis:**\n\nUnder San Francisco Rent Ordinance §37.9【2】, landlords must establish just cause for eviction. Delayed enforcement of lease terms (especially after apparent consent) may undermine the just cause requirement. Several tenant defense cases have successfully argued that landlord acquiescence waives the right to later enforce that provision.\n\n**Strategic Note:**\nWhile this defense is strong, it's secondary to the insufficient notice period defense, which has a clearer statutory basis and is less subject to judicial interpretation.",
  "analyzedDocumentCount": 23,
  "citedContent": [
    {
      "contentId": "9h5g7d01-c3e4-6f5g-d7h8-0c1d2e3f4g5h",
      "dataspaceId": "sds-a1b2c3d4",
      "fileName": "CLIENT-2024-789_Correspondence_July2024.pdf",
      "fileFormat": "application/pdf",
      "categoryId": "c3d4e5f6-g7h8-9012-cdef-gh3456789012",
      "metadata": {
        "clientId": "CLIENT-2024-789",
        "caseNumber": "CASE-2024-1523",
        "caseType": "Eviction Defense",
        "documentType": "Correspondence",
        "documentDate": "2024-07-15"
      },
      "createdAt": "2024-11-12T10:45:00Z",
      "updatedAt": "2024-11-12T10:45:00Z"
    },
    ...
  ],
  "webSearchResults": [],
  "citedLibraryPages": [
    {
      "id": "8a6b4c2d-3e1f-5b9a-8d0c-1e2f3a4b5c6e",
      "libraryId": "d8a96g75-1932-5b73-c9gd-2d074g77bgb7",
      "url": "https://codelibrary.amlegal.com/codes/san_francisco/latest/sf_admin/0-0-0-17656",
      "title": "San Francisco Administrative Code § 37.9 - Just Cause for Eviction",
      "domain": "codelibrary.amlegal.com",
      "contentType": "text/html",
      "scrapedAt": "2025-01-22T14:15:00Z",
      "relevanceScore": 0.87
    }
  ]
}

Step 7: Access Query Logs (Audit Trail)

Every query is automatically logged with comprehensive details for compliance, debugging, and usage tracking. Query logs provide:

Which user made which queries
Which documents were accessed
Which libraries were searched
Full request/response history
Timestamps and performance metrics

While the specific API for accessing query logs may vary by workspace configuration, Sync maintains detailed audit trails that typically include:

What's Logged for Each Query:

Query text and parameters
User ID and agent ID
Timestamp and response time
Dataspace and library IDs accessed
Content items accessed (with metadata filters applied)
Citations and sources used in response
Token usage and cost metrics
Conversation ID (for multi-turn tracking)

Why Query Logs are Essential:

Legal Compliance:
- Prove which attorney accessed which client files
- Demonstrate attorney-client privilege boundaries
- Track conflicts of interest (ethical walls)
Audit & Security:
- Detect unusual access patterns
- Monitor for potential data exfiltration
- Review user behavior for training purposes
Billing & Usage:
- Track query volume per client or per user
- Allocate costs to specific matters or projects
- Monitor API usage against quotas
Quality Assurance:
- Review query patterns to improve agent instructions
- Identify common questions to build training materials
- Debug user-reported issues by replaying exact queries
Research & Improvement:
- Analyze which sources are most frequently cited
- Understand which libraries provide the most value
- Refine metadata schemas based on actual filter usage

Example Log Entry (Conceptual):

{
  "queryId": "q-7a8b9c0d-1e2f-3a4b-5c6d-7e8f9a0b1c2d",
  "conversationId": "conv-a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "userId": "123e4567-e89b-12d3-a456-426614174001",
  "agentId": "550e8400-e29b-41d4-a716-446655440000",
  "query": "Based on our case files, what is our strongest defense strategy for the eviction case?",
  "timestamp": "2025-01-25T15:30:00Z",
  "dataspaceId": "sds-a1b2c3d4",
  "libraryIds": [
    "c7f85f64-9821-4a62-b8fc-1c963f66afa6",
    "d8a96g75-1932-5b73-c9gd-2d074g77bgb7"
  ],
  "contentFilters": {
    "clientId": "CLIENT-2024-789",
    "caseType": "Eviction Defense"
  },
  "documentsAccessed": [
    {
      "contentId": "7f3e5b89-a1c2-4d3e-b5f6-8a9b0c1d2e3f",
      "fileName": "CLIENT-2024-789_Eviction_Notice.pdf",
      "accessedAt": "2025-01-25T15:30:02Z"
    },
    {
      "contentId": "8g4f6c90-b2d3-5e4f-c6g7-9b0c1d2e3f4g",
      "fileName": "CLIENT-2024-789_Lease_Agreement.pdf",
      "accessedAt": "2025-01-25T15:30:02Z"
    }
  ],
  "libraryPagesAccessed": [
    {
      "libraryId": "c7f85f64-9821-4a62-b8fc-1c963f66afa6",
      "url": "https://leginfo.legislature.ca.gov/faces/codes_displaySection.xhtml?lawCode=CIV&sectionNum=1946.1",
      "accessedAt": "2025-01-25T15:30:03Z"
    }
  ],
  "responseTime": "4.2s",
  "tokensUsed": 3847,
  "analyzedDocumentCount": 23
}

Advanced: Override Default Libraries Per Query

Sometimes you want to use an agent but change which libraries are searched. Sync allows you to override the agent's default libraries on a per-query basis:

POST https://sws-x9p3q7r5.syncdocs.ai/api/content/sds-a1b2c3d4/query
Authorization: Bearer <your-token>
Content-Type: application/json

{
  "query": "What is the California statute of limitations for property damage claims?",
  "agentId": "550e8400-e29b-41d4-a716-446655440000",
  "context": {
    "libraries": ["c7f85f64-9821-4a62-b8fc-1c963f66afa6"]
  }
}

What Changed:

By specifying context.libraries, we override the agent's default library list. This query will:

Use the agent's instructions (legal expertise, citation style)
Search the dataspace for client documents
Search ONLY the CA State Law library (NOT the SF Municipal Code library)

Why This Is Useful:

Cost optimization: Don't search libraries you don't need for a specific query
Performance: Fewer sources = faster responses
Multi-jurisdiction: Easy to add/remove jurisdictions dynamically
Specialized queries: Use different reference materials for different question types

Summary: Why This Architecture

Throughout this guide, you've seen how Sync's research agent architecture uniquely enables:

1. Privacy & Data Isolation

Traditional Problems:

Client data mixed with reference materials in a single database
No easy way to ensure queries don't leak across clients
Difficult to implement ethical walls or privilege boundaries

How Sync Solves This:

✅ Client case files (dataspace) are physically separate from public references (libraries)
✅ Metadata filters enforce client-specific data access at query time
✅ Query logs provide complete audit trails of which documents were accessed
✅ Libraries can be shared across clients without privacy concerns

2. Governance & Compliance

Traditional Problems:

No audit trail of which AI queries accessed which documents
Difficult to prove compliance with privilege requirements
Can't track which legal sources informed which advice

How Sync Solves This:

✅ Every query logged with full details (user, agent, filters, documents accessed)
✅ Conversation threads stored for review and dispute resolution
✅ Citations link responses to specific source documents
✅ Metadata filters create technical enforcement of privilege boundaries

3. Ease of Use Across File Types

Traditional Problems:

Different systems for PDFs vs. Word docs vs. scanned images vs. web content
Complex ETL pipelines to make documents "AI-ready"
Separate interfaces for structured metadata vs. unstructured content

How Sync Solves This:

✅ Single query interface for PDFs, Word docs, scanned images, and scraped web content
✅ Libraries and dataspaces both searchable through the same API
✅ No preprocessing needed - upload documents and they're automatically processed
✅ Metadata queries combine with semantic search in one request

4. Flexible Context Management

Traditional Problems:

Hard-coded search scopes make systems inflexible
Each new use case requires new infrastructure
Difficult to combine multiple data sources in one query

How Sync Solves This:

✅ Metadata filters dynamically scope queries to specific clients/cases/time periods
✅ Libraries can be added/removed per query for flexible context
✅ Same agent serves multiple use cases by changing filters
✅ Conversation history automatically maintained across multi-turn dialogues

5. Unified Platform

Traditional Problems:

Document storage, AI processing, and search are separate products
Complex integrations between systems
Inconsistent governance models across tools

How Sync Solves This:

✅ Single platform for document storage, processing, metadata extraction, and querying
✅ One API for structured metadata AND semantic search AND library integration
✅ Consistent access control and audit logging across all operations
✅ No data movement between systems - everything happens in your VPC

What You've Built

Congratulations! You've created a production-ready legal research agent that:

✅ Searches across private client documents AND public legal references in one query
✅ Enforces client-specific data access through metadata filters
✅ Provides professional legal research output with proper citations
✅ Maintains conversation history for multi-turn research sessions
✅ Logs all activity for compliance and audit purposes
✅ Runs entirely in your infrastructure with complete data isolation

This same architecture can be adapted for:

Medical research agents (patient records + medical literature)
Financial analysis agents (client portfolios + market data)
Engineering support agents (internal docs + technical specifications)
Compliance review agents (company policies + regulatory databases)

Next Steps

Build on This Foundation

Automate Metadata Labeling

Ensure your client case files have proper metadata (clientId, caseType, etc.) for effective filtering. Learn how to use ontologies to automatically extract structured data from uploaded documents.

Deepen Your Understanding

Concepts: Agents - Advanced agent configuration, instruction engineering, and multi-agent systems
Concepts: Libraries - Library management, update strategies, and content curation
Concepts: Queries - Advanced query techniques, filter composition, and performance optimization
Concepts: Dataspaces - Data organization strategies and access control models

Build a Research AgentCopyCopy for LLMCopy page as Markdown for LLMsView as MarkdownOpen this page as MarkdownOpen in ChatGPTGet insights from ChatGPTOpen in ClaudeGet insights from ClaudeConnect to CursorInstall MCP server on CursorConnect to VS CodeInstall MCP server on VS Code

What You'll Build

What You'll Learn

Prerequisites

Understanding the Core Concepts

Libraries

Agents

Automatic Conversation History

Step 1: Create Your First Library (CA State Law)

Create the Library

Step 2: Create the SF Municipal Code Library

Step 3: Create the Research Agent

Step 4: Query Your Agent (Basic Research)

Step 5: Query with Metadata Filters (Client-Specific Research)

Step 6: Multi-Turn Conversation (Automatic History)

Step 7: Access Query Logs (Audit Trail)

Advanced: Override Default Libraries Per Query

Summary: Why This Architecture

1. Privacy & Data Isolation

2. Governance & Compliance

3. Ease of Use Across File Types

4. Flexible Context Management

5. Unified Platform

What You've Built

Next Steps

Build on This Foundation

Deepen Your Understanding

Was this helpful?

Build a Research Agent
Copy for LLM
Copy page as Markdown for LLMs
View as Markdown
Open this page as Markdown
Open in ChatGPT
Get insights from ChatGPT
Open in Claude
Get insights from Claude
Connect to Cursor
Install MCP server on Cursor
Connect to VS Code
Install MCP server on VS Code