Skip to main content
After DocIntell processes your document, you receive two types of results: classification (what type of document it is) and extraction (the actual data pulled from the document). This guide explains how to retrieve and understand these results.

What You Get After Processing

Once a document completes processing (status: completed), you have access to:
  1. Classification - What type of document was detected and why
  2. Extraction - The actual data extracted from the document
  3. Field Metadata - Confidence scores, page numbers, and source locations for each field
  4. Validation Results - Whether the extraction passed validation rules

Getting Full Extraction Results

Use GET /v1/jobs/{job_id}/results to retrieve the complete extraction output for a job:
curl -X GET https://api.docintell.com/v1/jobs/550e8400-e29b-41d4-a716-446655440000/results \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY"
When to use /results vs /documents/{id}/data:
  • Use /jobs/{job_id}/results for full extraction output with all metadata
  • Use /documents/{id}/data for filtered data based on your custom views (see Views Guide)

Understanding Classification

The classification tells you what type of document was detected and why.

Classification Fields

document_type
string
required
The detected document type code (e.g., invoice, capital_call, k1)
confidence
number
required
Classification confidence score from 0.0 to 1.0 (higher is more confident)
reasoning
string
required
1-2 sentence explanation of why this type was chosen
citation
string
Direct quote from the document that supports the classification
citation_page
integer
Page number where the citation was found (1-indexed)

Example Classification

{
  "classification": {
    "document_type": "capital_call",
    "confidence": 0.95,
    "reasoning": "Document is a capital call notice from a private equity fund requesting capital contribution from limited partners.",
    "citation": "CAPITAL CALL NOTICE - Fund IV, L.P.",
    "citation_page": 1
  }
}
DocIntell uses a multi-stage classification process:
  1. Visual analysis - Layout, headers, and document structure
  2. Text analysis - Key phrases, terminology, and language patterns
  3. Template matching - Common document formats (W-9, K-1, invoices, etc.)
The confidence score reflects how strongly the document matches the identified type. Scores above 0.90 are typically very reliable.

Understanding Extracted Data

The extraction contains the actual data pulled from your document.

Extraction Fields

document_type
string
required
Document type (matches classification)
page_count
integer
Number of pages in the document
extraction_model
string
LLM model used for extraction (e.g., google-vertex:gemini-2.5-flash)
processing_time_ms
integer
How long extraction took in milliseconds
data
object
required
The extracted fields as key-value pairs. Field names are in snake_case.
field_metadata
object
Per-field metadata including confidence scores, page numbers, and source locations
validation
object
Validation results with hard/soft violations

Example Extraction (Invoice)

{
  "extraction": {
    "document_type": "invoice",
    "page_count": 2,
    "extraction_model": "google-vertex:gemini-2.5-flash",
    "processing_time_ms": 3500,
    "data": {
      "invoice_number": "INV-2024-0892",
      "invoice_date": "2024-12-01",
      "due_date": "2024-12-31",
      "vendor_name": "Acme Corp",
      "total_amount": 1234.56,
      "currency": "USD",
      "line_items": [
        {
          "description": "Professional Services",
          "quantity": 40,
          "unit_price": 150.00,
          "amount": 6000.00
        }
      ]
    },
    "field_metadata": {
      "invoice_number": {
        "confidence": 0.98,
        "page_number": 1,
        "location_hint": "top right header",
        "raw_text": "INV-2024-0892"
      },
      "total_amount": {
        "confidence": 0.95,
        "page_number": 1,
        "location_hint": "bottom of page, summary section",
        "raw_text": "$1,234.56"
      }
    },
    "validation": {
      "is_valid": true,
      "hard_violations": [],
      "soft_violations": []
    }
  }
}

Understanding Field Metadata

Field metadata provides provenance and confidence information for each extracted field.
confidence
number
Self-reported confidence from the LLM (0.0 to 1.0). Directionally useful but not calibrated - a 90% confidence does not mean 90% accuracy.
page_number
integer
Page where the value was found (1-indexed). Useful for manual verification.
location_hint
string
Qualitative description of where on the page (e.g., “top header”, “in summary table”, “footer”)
raw_text
string
The original text as it appeared in the document before parsing

Example Field Metadata

{
  "field_metadata": {
    "invoice_number": {
      "confidence": 0.98,
      "page_number": 1,
      "location_hint": "top right header",
      "raw_text": "INV-2024-0892"
    },
    "total_amount": {
      "confidence": 0.95,
      "page_number": 1,
      "location_hint": "bottom of page, summary section",
      "raw_text": "$1,234.56"
    },
    "due_date": {
      "confidence": 0.92,
      "page_number": 1,
      "location_hint": "near invoice date in header",
      "raw_text": "Due: December 31, 2024"
    }
  }
}
Confidence Score Guidelines:
  • 0.95+ - Very high confidence (rarely wrong)
  • 0.85-0.94 - High confidence (generally reliable)
  • 0.70-0.84 - Moderate confidence (worth verifying)
  • Below 0.70 - Low confidence (manual review recommended)

Understanding Validation Results

Validation checks whether the extracted data meets expected constraints.

Validation Types

  1. Hard Violations - Critical errors that indicate extraction failure
  2. Soft Violations - Warnings that may require attention but don’t fail the extraction
is_valid
boolean
required
true if all hard constraints passed, false if any hard violations exist
hard_violations
array
List of critical validation failures
soft_violations
array
List of warnings or optional field issues

Example Validation (Passing)

{
  "validation": {
    "is_valid": true,
    "hard_violations": [],
    "soft_violations": [
      {
        "field": "swift_code",
        "severity": "soft",
        "message": "Optional field 'swift_code' not found in document"
      }
    ]
  }
}

Example Validation (Failing)

{
  "validation": {
    "is_valid": false,
    "hard_violations": [
      {
        "field": "due_date",
        "severity": "hard",
        "message": "Required field 'due_date' is missing"
      },
      {
        "field": "total_amount",
        "severity": "hard",
        "message": "Field 'total_amount' failed validation: must be a positive number"
      }
    ],
    "soft_violations": []
  }
}
When is_valid is false, the extracted data may be incomplete or unreliable. Review hard_violations to understand what went wrong.

Complete Example: Invoice

Here’s a full response for an invoice extraction:
{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "document_id": "6789def0-abcd-4567-ef01-23456789abcd",
  "status": "completed",
  "classification": {
    "document_type": "invoice",
    "confidence": 0.96,
    "reasoning": "Document is a vendor invoice with itemized charges and payment terms.",
    "citation": "INVOICE",
    "citation_page": 1
  },
  "extraction": {
    "document_type": "invoice",
    "page_count": 2,
    "extraction_model": "google-vertex:gemini-2.5-flash",
    "processing_time_ms": 3500,
    "data": {
      "invoice_number": "INV-2024-0892",
      "invoice_date": "2024-12-01",
      "due_date": "2024-12-31",
      "vendor_name": "Acme Corp",
      "vendor_address": "123 Main St, San Francisco, CA 94105",
      "customer_name": "ABC Capital Partners",
      "total_amount": 1234.56,
      "currency": "USD",
      "payment_terms": "Net 30",
      "line_items": [
        {
          "description": "Professional Services - November 2024",
          "quantity": 40,
          "unit_price": 150.00,
          "amount": 6000.00
        },
        {
          "description": "Software License",
          "quantity": 1,
          "unit_price": 500.00,
          "amount": 500.00
        }
      ]
    },
    "field_metadata": {
      "invoice_number": {
        "confidence": 0.98,
        "page_number": 1,
        "location_hint": "top right header",
        "raw_text": "INV-2024-0892"
      },
      "invoice_date": {
        "confidence": 0.97,
        "page_number": 1,
        "location_hint": "header section below invoice number",
        "raw_text": "Date: December 1, 2024"
      },
      "total_amount": {
        "confidence": 0.95,
        "page_number": 1,
        "location_hint": "bottom of page, summary section",
        "raw_text": "Total: $1,234.56"
      },
      "vendor_name": {
        "confidence": 0.99,
        "page_number": 1,
        "location_hint": "top left header",
        "raw_text": "Acme Corp"
      }
    },
    "validation": {
      "is_valid": true,
      "hard_violations": [],
      "soft_violations": []
    }
  }
}

Complete Example: Capital Call

Here’s a full response for a capital call extraction:
{
  "job_id": "660e8400-e29b-41d4-a716-446655440001",
  "document_id": "7890abc1-def2-5678-9012-345678901234",
  "status": "completed",
  "classification": {
    "document_type": "capital_call",
    "confidence": 0.95,
    "reasoning": "Document is a capital call notice from a private equity fund requesting capital contribution from limited partners.",
    "citation": "CAPITAL CALL NOTICE - Fund IV, L.P.",
    "citation_page": 1
  },
  "extraction": {
    "document_type": "capital_call",
    "page_count": 3,
    "extraction_model": "google-vertex:gemini-2.5-flash",
    "processing_time_ms": 4200,
    "data": {
      "fund_name": "ABC Partners Fund IV, L.P.",
      "call_reference": "CC-2024-Q4-001",
      "notice_date": "2024-12-01",
      "due_date": "2024-12-15",
      "call_amount_lp": 4500000.00,
      "call_amount_fund": 50000000.00,
      "lp_ownership_percentage": 9.0,
      "investment_amount": 4200000.00,
      "management_fee_amount": 250000.00,
      "other_expenses_amount": 50000.00,
      "bank_name": "Silicon Valley Bank",
      "account_number": "****1234",
      "routing_number": "121000248",
      "swift_code": "SVBKUS6S",
      "wire_reference": "ABC Fund IV - CC-2024-Q4-001"
    },
    "field_metadata": {
      "fund_name": {
        "confidence": 0.99,
        "page_number": 1,
        "location_hint": "top of page, main header",
        "raw_text": "ABC Partners Fund IV, L.P."
      },
      "call_amount_lp": {
        "confidence": 0.96,
        "page_number": 1,
        "location_hint": "summary table, highlighted row",
        "raw_text": "$4,500,000.00"
      },
      "due_date": {
        "confidence": 0.98,
        "page_number": 1,
        "location_hint": "prominently displayed below header",
        "raw_text": "Payment Due: December 15, 2024"
      },
      "bank_name": {
        "confidence": 0.97,
        "page_number": 2,
        "location_hint": "wire instructions section",
        "raw_text": "Silicon Valley Bank"
      },
      "swift_code": {
        "confidence": 0.94,
        "page_number": 2,
        "location_hint": "wire instructions section",
        "raw_text": "SWIFT: SVBKUS6S"
      }
    },
    "validation": {
      "is_valid": true,
      "hard_violations": [],
      "soft_violations": [
        {
          "field": "call_amount_calculation",
          "severity": "soft",
          "message": "LP call amount ($4,500,000) does not exactly match fund call ($50,000,000) × ownership (9.0%) = $4,500,000. Difference: $0 (within tolerance)."
        }
      ]
    }
  }
}

Error Handling

Job Not Completed

If you try to get results before the job completes:
{
  "error": "job_not_completed",
  "message": "Job not completed. Current status: processing. Results are only available for completed jobs."
}
HTTP Status: 400 Bad Request Fix: Wait for the job to complete or use webhooks for notifications.

Job Not Found

{
  "error": "not_found",
  "message": "Job not found: 550e8400-e29b-41d4-a716-446655440000. It may not exist or you may not have access to it."
}
HTTP Status: 404 Not Found Possible Causes:
  • Job ID does not exist
  • Job belongs to a different tenant
  • Typo in the job ID

Best Practices

Check Confidence Scores

Review field-level confidence scores for critical data. Fields with low confidence may need manual verification.

Use Page Numbers

The page_number and location_hint help you quickly locate and verify extracted values in the original PDF.

Handle Soft Violations

Soft violations are warnings, not errors. They may indicate missing optional fields or minor inconsistencies.

Log Validation Failures

When is_valid is false, log the hard_violations for debugging and quality monitoring.

Next Steps