Schema Discovery & Custom Views

What is Schema Projection?

Schema Projection is DocIntell’s core differentiator: instead of dumping gigabytes of raw OCR data, you define exactly which fields you need and get back only that structured data.

The Problem with Traditional OCR

Traditional OCR APIs return everything they extract - bounding boxes, confidence scores, page coordinates - resulting in massive payloads:

// Traditional OCR: 50-page invoice → 45MB response
{
  "pages": [
    {
      "page_number": 1,
      "text_annotations": [
        {
          "description": "Invoice",
          "bounding_poly": {"vertices": [...]},
          "confidence": 0.99
        },
        // ... thousands more annotations
      ]
    },
    // ... 49 more pages
  ]
}

DocIntell’s Approach: Schema Projection

With DocIntell, you define which fields matter and get back structured data:

// DocIntell: Same 50-page invoice → 2KB response (20-2000x smaller)
{
  "document_id": "0194e123-4567-7890-abcd-ef1234567890",
  "document_type": "invoice",
  "view": "accounting_v1",
  "data": {
    "invoice_number": "INV-2024-001",
    "invoice_date": "2024-01-15",
    "due_date": "2024-02-15",
    "vendor_name": "Acme Corporation",
    "total_amount": 15432.50,
    "line_items": [
      {
        "description": "Professional Services",
        "quantity": 80,
        "unit_price": 192.91,
        "total": 15432.50
      }
    ]
  }
}

Key Benefits:

20-2000x smaller payloads - Only the data you need, nothing more
Ingest once, query many ways - Create multiple views for the same document
Type-safe schemas - Well-defined field types with validation

Discover Available Document Types

Before creating views, discover what document types DocIntell supports and what fields are available for extraction.

List All Document Types

Get a high-level overview of all supported document types:

curl -X GET https://api.docintell.com/v1/schemas \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY"

Response:

{
  "schemas": [
    {
      "document_type": "capital_call",
      "name": "Capital Call Notice",
      "category": "fund_operations",
      "description": "Capital calls for fund contributions",
      "schema_version": "v1",
      "field_count": 12
    },
    {
      "document_type": "invoice",
      "name": "Invoice",
      "category": "accounting",
      "description": "Vendor invoices and bills",
      "schema_version": "v1",
      "field_count": 18
    },
    {
      "document_type": "k1",
      "name": "Schedule K-1",
      "category": "tax",
      "description": "IRS Schedule K-1 tax forms",
      "schema_version": "v1",
      "field_count": 24
    }
  ]
}

Get Full Schema Definition

Retrieve the complete field definitions for a specific document type:

curl -X GET https://api.docintell.com/v1/schemas/invoice \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY"

Response:

{
  "document_type": "invoice",
  "name": "Invoice",
  "category": "accounting",
  "description": "Vendor invoices and bills",
  "schema_version": "v1",
  "fields": [
    {
      "field_name": "invoice_number",
      "field_type": "string",
      "severity": "hard",
      "is_nullable": false,
      "description": "Unique invoice identifier",
      "pattern": null
    },
    {
      "field_name": "invoice_date",
      "field_type": "date",
      "severity": "hard",
      "is_nullable": false,
      "description": "Date invoice was issued",
      "pattern": null
    },
    {
      "field_name": "due_date",
      "field_type": "date",
      "severity": "soft",
      "is_nullable": true,
      "description": "Payment due date",
      "pattern": null
    },
    {
      "field_name": "vendor_name",
      "field_type": "string",
      "severity": "hard",
      "is_nullable": false,
      "description": "Name of the vendor/supplier",
      "pattern": null
    },
    {
      "field_name": "vendor_address",
      "field_type": "string",
      "severity": "soft",
      "is_nullable": true,
      "description": "Vendor's billing address",
      "pattern": null
    },
    {
      "field_name": "total_amount",
      "field_type": "monetary",
      "severity": "hard",
      "is_nullable": false,
      "description": "Total invoice amount including tax",
      "pattern": null
    },
    {
      "field_name": "subtotal",
      "field_type": "monetary",
      "severity": "soft",
      "is_nullable": true,
      "description": "Subtotal before tax",
      "pattern": null
    },
    {
      "field_name": "tax_amount",
      "field_type": "monetary",
      "severity": "soft",
      "is_nullable": true,
      "description": "Total tax amount",
      "pattern": null
    },
    {
      "field_name": "currency",
      "field_type": "string",
      "severity": "soft",
      "is_nullable": true,
      "description": "Currency code (e.g., USD, EUR)",
      "pattern": "^[A-Z]{3}$"
    },
    {
      "field_name": "line_items",
      "field_type": "array",
      "severity": "soft",
      "is_nullable": true,
      "description": "Invoice line items with descriptions and amounts",
      "pattern": null
    }
  ],
  "validations": [
    {
      "name": "total_equals_subtotal_plus_tax",
      "severity": "soft",
      "message": "Total should equal subtotal plus tax",
      "fields_involved": ["total_amount", "subtotal", "tax_amount"]
    }
  ]
}

Understanding Field Definitions

Field	Description
`field_name`	Field identifier (snake_case) - use this in views
`field_type`	Data type: `string`, `decimal`, `date`, `monetary`, `boolean`, `integer`, `array`
`severity`	`hard` = required field (extraction fails if missing) `soft` = optional field (extraction continues if missing)
`is_nullable`	Whether the field can be `null` even if present
`description`	Human-readable explanation of the field
`pattern`	Regex validation pattern (if applicable)

Field Severity Matters:

Hard fields are critical and must be present for extraction to succeed
Soft fields are nice-to-have and won’t fail extraction if missing

Create Custom Views

Views define which fields you want to retrieve when querying document data. Think of them as SQL SELECT statements that filter the extracted data.

Why Use Views?

Multiple Use Cases

Create different views for accounting, compliance, and auditing teams - all from the same extraction.

Reduced Payload Size

Only retrieve the fields you need. A “quick summary” view might return 5 fields instead of 50.

Separation of Concerns

Different teams see different data without re-processing the document.

Version Control

Name views like “accounting_v1” and “accounting_v2” to manage schema evolution.

Creating a View

Create a view by specifying the document type and which fields to include:

curl -X POST https://api.docintell.com/v1/views \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "document_type": "invoice",
    "name": "accounting_v1",
    "description": "Fields needed for accounts payable processing",
    "fields": [
      "invoice_number",
      "invoice_date",
      "due_date",
      "vendor_name",
      "total_amount",
      "currency"
    ],
    "is_default": true
  }'

Response:

{
  "view_id": "0194e456-7890-7abc-def0-123456789abc",
  "document_type": "invoice",
  "name": "accounting_v1",
  "description": "Fields needed for accounts payable processing",
  "fields": [
    "invoice_number",
    "invoice_date",
    "due_date",
    "vendor_name",
    "total_amount",
    "currency"
  ],
  "is_default": true,
  "created_at": "2024-01-15T10:30:00Z",
  "updated_at": "2024-01-15T10:30:00Z"
}

Default Views

Set is_default: true to make a view the default for its document type. When you query document data without specifying a view, the default view is used.

Only one default view per document type. Setting a new default automatically unsets the previous one.

List Your Views

See all views you’ve created:

curl -X GET https://api.docintell.com/v1/views \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY"

Filter by document type:

curl -X GET "https://api.docintell.com/v1/views?document_type=invoice" \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY"

Update a View

Modify an existing view (fields, description, or default status):

curl -X PUT https://api.docintell.com/v1/views/0194e456-7890-7abc-def0-123456789abc \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "fields": [
      "invoice_number",
      "invoice_date",
      "vendor_name",
      "total_amount",
      "line_items"
    ],
    "description": "Updated to include line items for detailed analysis"
  }'

View names cannot be changed after creation. If you need a different name, create a new view and delete the old one.

Delete a View

Remove a view you no longer need:

curl -X DELETE https://api.docintell.com/v1/views/0194e456-7890-7abc-def0-123456789abc \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY"

Response: 204 No Content

Query Data with Views

Once you’ve created views, use them to retrieve extracted document data filtered to exactly the fields you need.

Query with a Specific View

Retrieve document data using a named view:

curl -X GET "https://api.docintell.com/v1/documents/0194e123-4567-7890-abcd-ef1234567890/data?view=accounting_v1" \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY"

Response:

{
  "document_id": "0194e123-4567-7890-abcd-ef1234567890",
  "document_type": "invoice",
  "view": "accounting_v1",
  "data": {
    "invoice_number": "INV-2024-001",
    "invoice_date": "2024-01-15",
    "due_date": "2024-02-15",
    "vendor_name": "Acme Corporation",
    "total_amount": 15432.50,
    "currency": "USD"
  },
  "field_metadata": null
}

Query with Default View

If you don’t specify a view, the default view for the document type is used:

# Uses the default view for the document type
curl -X GET "https://api.docintell.com/v1/documents/0194e123-4567-7890-abcd-ef1234567890/data" \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY"

If no default view exists, all fields are returned.

Include Field Metadata

Get additional metadata for each field (confidence scores, page numbers, etc.):

curl -X GET "https://api.docintell.com/v1/documents/0194e123-4567-7890-abcd-ef1234567890/data?view=accounting_v1&include_metadata=true" \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY"

Response:

{
  "document_id": "0194e123-4567-7890-abcd-ef1234567890",
  "document_type": "invoice",
  "view": "accounting_v1",
  "data": {
    "invoice_number": "INV-2024-001",
    "invoice_date": "2024-01-15",
    "vendor_name": "Acme Corporation",
    "total_amount": 15432.50
  },
  "field_metadata": {
    "invoice_number": {
      "confidence": 0.99,
      "page_number": 1,
      "bounding_box": {
        "x": 450,
        "y": 120,
        "width": 180,
        "height": 24
      }
    },
    "total_amount": {
      "confidence": 0.97,
      "page_number": 1,
      "bounding_box": {
        "x": 650,
        "y": 800,
        "width": 120,
        "height": 20
      }
    }
  }
}

Field metadata is only available if you enable include_metadata=true. It’s disabled by default to reduce payload size.

Query the Same Document with Different Views

This is where Schema Projection shines - query the same document multiple ways:

Example: Accounting vs. Compliance Views

Accounting View (6 fields for AP processing):

curl -X GET "https://api.docintell.com/v1/documents/{id}/data?view=accounting_v1" \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY"

{
  "view": "accounting_v1",
  "data": {
    "invoice_number": "INV-2024-001",
    "invoice_date": "2024-01-15",
    "due_date": "2024-02-15",
    "vendor_name": "Acme Corporation",
    "total_amount": 15432.50,
    "currency": "USD"
  }
}

Compliance View (8 fields for audit trail):

curl -X GET "https://api.docintell.com/v1/documents/{id}/data?view=compliance_v1" \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY"

{
  "view": "compliance_v1",
  "data": {
    "invoice_number": "INV-2024-001",
    "vendor_name": "Acme Corporation",
    "vendor_address": "123 Main St, Anytown, CA 94000",
    "vendor_tax_id": "12-3456789",
    "payment_terms": "Net 30",
    "purchase_order": "PO-2024-056",
    "approved_by": "Jane Smith",
    "approval_date": "2024-01-14"
  }
}

Same document, same extraction, different views - no re-processing.

Best Practices

1. Create Views for Each Use Case

Don’t use a single “all fields” view for everything. Create specific views for each team or workflow:

Accounting Team

accounting_v1: invoice_number, vendor_name, total_amount, due_date

Compliance Team

compliance_v1: vendor_tax_id, payment_terms, approved_by, approval_date

Audit Team

audit_v1: All financial fields + approval workflow fields

Quick Summary

summary_v1: Just 3-5 key fields for dashboards

2. Use Semantic Versioning for View Names

Plan for schema evolution by versioning your views:

accounting_v1  → First version
accounting_v2  → Added line_items field
accounting_v3  → Added tax_breakdown field

This allows you to:

Migrate gradually - New code uses v2, old code continues using v1
A/B test schema changes - Compare v1 vs v2 side-by-side
Roll back if needed - Switch back to v1 if v2 has issues

3. Set Default Views for Common Queries

Make your most common view the default:

{
  "name": "accounting_v1",
  "is_default": true
}

This simplifies client code:

# No need to specify view - uses default
response = requests.get(f"/v1/documents/{doc_id}/data")

4. Validate Fields Before Creating Views

Always check the schema first to ensure your fields exist:

# 1. Get schema
schema = requests.get("/v1/schemas/invoice").json()
available_fields = [f["field_name"] for f in schema["fields"]]

# 2. Validate your fields
desired_fields = ["invoice_number", "total_amount", "line_items"]
invalid_fields = [f for f in desired_fields if f not in available_fields]

if invalid_fields:
    print(f"Invalid fields: {invalid_fields}")
else:
    # 3. Create view
    requests.post("/v1/views", json={
        "document_type": "invoice",
        "fields": desired_fields,
        ...
    })

5. Use `include_metadata` Sparingly

Only request field metadata when you actually need it (e.g., for quality review):

# ❌ Always including metadata adds unnecessary payload size
data = get_document_data(doc_id, view="accounting_v1", include_metadata=True)

# ✅ Only request metadata when needed
if needs_quality_review:
    data = get_document_data(doc_id, view="accounting_v1", include_metadata=True)
else:
    data = get_document_data(doc_id, view="accounting_v1")

6. Document Your Views

Maintain a mapping of views to use cases in your documentation:

# DocIntell Views

## Invoices

- **accounting_v1** - AP processing (6 fields)
- **compliance_v1** - Vendor verification (8 fields)
- **audit_v1** - Full audit trail (15 fields)
- **summary_v1** - Dashboard display (3 fields)

## Capital Calls

- **fund_ops_v1** - Fund operations (10 fields)
- ...

Error Handling

Invalid Fields

If you try to create a view with fields that don’t exist in the schema:

{
  "error": "invalid_fields",
  "message": "The following fields are not available for document type 'invoice': invalid_field, another_bad_field",
  "invalid_fields": ["invalid_field", "another_bad_field"],
  "available_fields": [
    "invoice_number",
    "invoice_date",
    "vendor_name",
    "total_amount",
    "..."
  ]
}

HTTP Status: 400 Bad Request Fix: Check the schema (GET /v1/schemas/invoice) for valid field names.

View Not Found

If you query with a view that doesn’t exist:

{
  "detail": "View not found: 'nonexistent_view' for document type 'invoice'"
}

HTTP Status: 404 Not Found Fix: Check your view name or create the view first (POST /v1/views).

Document Type Not Found

If you try to create a view for an unsupported document type:

{
  "detail": "Document type not found: 'unsupported_type'"
}

HTTP Status: 404 Not Found Fix: List available document types (GET /v1/schemas).

Document Not Ready

If you query data before extraction completes:

{
  "error": "document_not_ready",
  "message": "Document extraction not completed. Current status: processing",
  "status": "processing"
}

HTTP Status: 400 Bad Request Fix: Wait for extraction to complete (check job status with GET /v1/jobs/{job_id}).

Complete Example: End-to-End Workflow

Here’s a complete example showing schema discovery, view creation, and data querying:

import requests

API_KEY = "dk_live_YOUR_API_KEY"
BASE_URL = "https://api.docintell.com/v1"
headers = {"Authorization": f"Bearer {API_KEY}"}

# 1. Discover available document types
schemas = requests.get(f"{BASE_URL}/schemas", headers=headers).json()
print(f"Available document types: {[s['document_type'] for s in schemas['schemas']]}")

# 2. Get full schema for invoices
invoice_schema = requests.get(f"{BASE_URL}/schemas/invoice", headers=headers).json()
print(f"Invoice fields: {[f['field_name'] for f in invoice_schema['fields']]}")

# 3. Create an accounting view
accounting_view = requests.post(
    f"{BASE_URL}/views",
    headers=headers,
    json={
        "document_type": "invoice",
        "name": "accounting_v1",
        "description": "Fields for AP processing",
        "fields": [
            "invoice_number",
            "invoice_date",
            "due_date",
            "vendor_name",
            "total_amount",
            "currency"
        ],
        "is_default": True
    }
).json()
print(f"Created view: {accounting_view['view_id']}")

# 4. Create a compliance view
compliance_view = requests.post(
    f"{BASE_URL}/views",
    headers=headers,
    json={
        "document_type": "invoice",
        "name": "compliance_v1",
        "description": "Fields for vendor verification",
        "fields": [
            "invoice_number",
            "vendor_name",
            "vendor_address",
            "vendor_tax_id",
            "payment_terms",
            "approved_by"
        ],
        "is_default": False
    }
).json()
print(f"Created view: {compliance_view['view_id']}")

# 5. Upload a document (returns immediately with job_id)
with open("invoice.pdf", "rb") as f:
    upload_response = requests.post(
        f"{BASE_URL}/documents",
        headers=headers,
        files={"file": f},
        data={"retention_years": 7, "document_type": "invoice"}
    ).json()

document_id = upload_response["document_id"]
job_id = upload_response["job_id"]
print(f"Document uploaded: {document_id}, job: {job_id}")

# 6. Poll for job completion (in production, use webhooks)
import time
while True:
    job = requests.get(f"{BASE_URL}/jobs/{job_id}", headers=headers).json()
    if job["status"] == "completed":
        print("Extraction completed!")
        break
    elif job["status"] == "failed":
        print(f"Extraction failed: {job.get('error_message')}")
        exit(1)
    time.sleep(2)

# 7. Query with accounting view
accounting_data = requests.get(
    f"{BASE_URL}/documents/{document_id}/data?view=accounting_v1",
    headers=headers
).json()
print(f"Accounting data: {accounting_data['data']}")

# 8. Query with compliance view (same document, different fields!)
compliance_data = requests.get(
    f"{BASE_URL}/documents/{document_id}/data?view=compliance_v1",
    headers=headers
).json()
print(f"Compliance data: {compliance_data['data']}")

Output:

Available document types: ['capital_call', 'invoice', 'k1', ...]
Invoice fields: ['invoice_number', 'invoice_date', 'vendor_name', ...]
Created view: 0194e456-7890-7abc-def0-123456789abc
Created view: 0194e456-7890-7abc-def0-123456789def
Document uploaded: 0194e123-4567-7890-abcd-ef1234567890, job: 0194e123-4567-7890-abcd-ef1234567891
Extraction completed!
Accounting data: {'invoice_number': 'INV-2024-001', 'total_amount': 15432.50, ...}
Compliance data: {'invoice_number': 'INV-2024-001', 'vendor_tax_id': '12-3456789', ...}

Next Steps

Upload Your First Document

Start extracting data from PDFs

Webhook Setup

Get notified when extraction completes

API Reference

Full API documentation

Error Handling

Handle API errors gracefully

Getting Started

Core Concepts

Guides

SDKs & Examples

Resources

​What is Schema Projection?

​The Problem with Traditional OCR

​DocIntell’s Approach: Schema Projection

​Discover Available Document Types

​List All Document Types

​Get Full Schema Definition

​Understanding Field Definitions

​Create Custom Views

​Why Use Views?

Multiple Use Cases

Reduced Payload Size

Separation of Concerns

Version Control

​Creating a View

​Default Views

​List Your Views

​Update a View

​Delete a View

​Query Data with Views

​Query with a Specific View

​Query with Default View

​Include Field Metadata

​Query the Same Document with Different Views

​Best Practices

​1. Create Views for Each Use Case

Accounting Team

Compliance Team

Audit Team

Quick Summary

​2. Use Semantic Versioning for View Names

​3. Set Default Views for Common Queries

​4. Validate Fields Before Creating Views

​5. Use include_metadata Sparingly

​6. Document Your Views

​Error Handling

​Invalid Fields

​View Not Found

​Document Type Not Found

​Document Not Ready

​Complete Example: End-to-End Workflow

​Next Steps

Upload Your First Document

Webhook Setup

API Reference

Error Handling

What is Schema Projection?

The Problem with Traditional OCR

DocIntell’s Approach: Schema Projection

Discover Available Document Types

List All Document Types

Get Full Schema Definition

Understanding Field Definitions

Create Custom Views

Why Use Views?

Creating a View

Default Views

List Your Views

Update a View

Delete a View

Query Data with Views

Query with a Specific View

Query with Default View

Include Field Metadata

Query the Same Document with Different Views

Best Practices

1. Create Views for Each Use Case

2. Use Semantic Versioning for View Names

3. Set Default Views for Common Queries

4. Validate Fields Before Creating Views

5. Use `include_metadata` Sparingly

6. Document Your Views

Error Handling

Invalid Fields

View Not Found

Document Type Not Found

Document Not Ready

Complete Example: End-to-End Workflow

Next Steps