Understanding Extraction Results

After DocIntell processes your document, you receive two types of results: classification (what type of document it is) and extraction (the actual data pulled from the document). This guide explains how to retrieve and understand these results.

What You Get After Processing

Once a document completes processing (status: completed), you have access to:

Classification - What type of document was detected and why
Extraction - The actual data extracted from the document
Field Metadata - Confidence scores, page numbers, and source locations for each field
Validation Results - Whether the extraction passed validation rules

Getting Full Extraction Results

Use GET /v1/jobs/{job_id}/results to retrieve the complete extraction output for a job:

curl -X GET https://api.docintell.com/v1/jobs/550e8400-e29b-41d4-a716-446655440000/results \
  -H "Authorization: Bearer dk_live_YOUR_API_KEY"

When to use /results vs /documents/{id}/data:

Use /jobs/{job_id}/results for full extraction output with all metadata
Use /documents/{id}/data for filtered data based on your custom views (see Views Guide)

Understanding Classification

The classification tells you what type of document was detected and why.

Classification Fields

document_type

string

required

The detected document type code (e.g., invoice, capital_call, k1)

confidence

number

required

Classification confidence score from 0.0 to 1.0 (higher is more confident)

reasoning

string

required

1-2 sentence explanation of why this type was chosen

citation

string

Direct quote from the document that supports the classification

citation_page

integer

Page number where the citation was found (1-indexed)

Example Classification

{
  "classification": {
    "document_type": "capital_call",
    "confidence": 0.95,
    "reasoning": "Document is a capital call notice from a private equity fund requesting capital contribution from limited partners.",
    "citation": "CAPITAL CALL NOTICE - Fund IV, L.P.",
    "citation_page": 1
  }
}

How is the document type determined?

DocIntell uses a multi-stage classification process:

Visual analysis - Layout, headers, and document structure
Text analysis - Key phrases, terminology, and language patterns
Template matching - Common document formats (W-9, K-1, invoices, etc.)

The confidence score reflects how strongly the document matches the identified type. Scores above 0.90 are typically very reliable.

Understanding Extracted Data

The extraction contains the actual data pulled from your document.

Extraction Fields

document_type

string

required

Document type (matches classification)

page_count

integer

Number of pages in the document

extraction_model

string

LLM model used for extraction (e.g., google-vertex:gemini-2.5-flash)

processing_time_ms

integer

How long extraction took in milliseconds

data

object

required

The extracted fields as key-value pairs. Field names are in snake_case.

field_metadata

object

Per-field metadata including confidence scores, page numbers, and source locations

validation

object

Validation results with hard/soft violations

Example Extraction (Invoice)

{
  "extraction": {
    "document_type": "invoice",
    "page_count": 2,
    "extraction_model": "google-vertex:gemini-2.5-flash",
    "processing_time_ms": 3500,
    "data": {
      "invoice_number": "INV-2024-0892",
      "invoice_date": "2024-12-01",
      "due_date": "2024-12-31",
      "vendor_name": "Acme Corp",
      "total_amount": 1234.56,
      "currency": "USD",
      "line_items": [
        {
          "description": "Professional Services",
          "quantity": 40,
          "unit_price": 150.00,
          "amount": 6000.00
        }
      ]
    },
    "field_metadata": {
      "invoice_number": {
        "confidence": 0.98,
        "page_number": 1,
        "location_hint": "top right header",
        "raw_text": "INV-2024-0892"
      },
      "total_amount": {
        "confidence": 0.95,
        "page_number": 1,
        "location_hint": "bottom of page, summary section",
        "raw_text": "$1,234.56"
      }
    },
    "validation": {
      "is_valid": true,
      "hard_violations": [],
      "soft_violations": []
    }
  }
}

Understanding Field Metadata

Field metadata provides provenance and confidence information for each extracted field.

confidence

number

Self-reported confidence from the LLM (0.0 to 1.0). Directionally useful but not calibrated - a 90% confidence does not mean 90% accuracy.

page_number

integer

Page where the value was found (1-indexed). Useful for manual verification.

location_hint

string

Qualitative description of where on the page (e.g., “top header”, “in summary table”, “footer”)

raw_text

string

The original text as it appeared in the document before parsing

Example Field Metadata

{
  "field_metadata": {
    "invoice_number": {
      "confidence": 0.98,
      "page_number": 1,
      "location_hint": "top right header",
      "raw_text": "INV-2024-0892"
    },
    "total_amount": {
      "confidence": 0.95,
      "page_number": 1,
      "location_hint": "bottom of page, summary section",
      "raw_text": "$1,234.56"
    },
    "due_date": {
      "confidence": 0.92,
      "page_number": 1,
      "location_hint": "near invoice date in header",
      "raw_text": "Due: December 31, 2024"
    }
  }
}

Confidence Score Guidelines:

0.95+ - Very high confidence (rarely wrong)
0.85-0.94 - High confidence (generally reliable)
0.70-0.84 - Moderate confidence (worth verifying)
Below 0.70 - Low confidence (manual review recommended)

Understanding Validation Results

Validation checks whether the extracted data meets expected constraints.

Validation Types

Hard Violations - Critical errors that indicate extraction failure
Soft Violations - Warnings that may require attention but don’t fail the extraction

is_valid

boolean

required

true if all hard constraints passed, false if any hard violations exist

hard_violations

array

List of critical validation failures

soft_violations

array

List of warnings or optional field issues

Example Validation (Passing)

{
  "validation": {
    "is_valid": true,
    "hard_violations": [],
    "soft_violations": [
      {
        "field": "swift_code",
        "severity": "soft",
        "message": "Optional field 'swift_code' not found in document"
      }
    ]
  }
}

Example Validation (Failing)

{
  "validation": {
    "is_valid": false,
    "hard_violations": [
      {
        "field": "due_date",
        "severity": "hard",
        "message": "Required field 'due_date' is missing"
      },
      {
        "field": "total_amount",
        "severity": "hard",
        "message": "Field 'total_amount' failed validation: must be a positive number"
      }
    ],
    "soft_violations": []
  }
}

When is_valid is false, the extracted data may be incomplete or unreliable. Review hard_violations to understand what went wrong.

Complete Example: Invoice

Here’s a full response for an invoice extraction:

{
  "job_id": "550e8400-e29b-41d4-a716-446655440000",
  "document_id": "6789def0-abcd-4567-ef01-23456789abcd",
  "status": "completed",
  "classification": {
    "document_type": "invoice",
    "confidence": 0.96,
    "reasoning": "Document is a vendor invoice with itemized charges and payment terms.",
    "citation": "INVOICE",
    "citation_page": 1
  },
  "extraction": {
    "document_type": "invoice",
    "page_count": 2,
    "extraction_model": "google-vertex:gemini-2.5-flash",
    "processing_time_ms": 3500,
    "data": {
      "invoice_number": "INV-2024-0892",
      "invoice_date": "2024-12-01",
      "due_date": "2024-12-31",
      "vendor_name": "Acme Corp",
      "vendor_address": "123 Main St, San Francisco, CA 94105",
      "customer_name": "ABC Capital Partners",
      "total_amount": 1234.56,
      "currency": "USD",
      "payment_terms": "Net 30",
      "line_items": [
        {
          "description": "Professional Services - November 2024",
          "quantity": 40,
          "unit_price": 150.00,
          "amount": 6000.00
        },
        {
          "description": "Software License",
          "quantity": 1,
          "unit_price": 500.00,
          "amount": 500.00
        }
      ]
    },
    "field_metadata": {
      "invoice_number": {
        "confidence": 0.98,
        "page_number": 1,
        "location_hint": "top right header",
        "raw_text": "INV-2024-0892"
      },
      "invoice_date": {
        "confidence": 0.97,
        "page_number": 1,
        "location_hint": "header section below invoice number",
        "raw_text": "Date: December 1, 2024"
      },
      "total_amount": {
        "confidence": 0.95,
        "page_number": 1,
        "location_hint": "bottom of page, summary section",
        "raw_text": "Total: $1,234.56"
      },
      "vendor_name": {
        "confidence": 0.99,
        "page_number": 1,
        "location_hint": "top left header",
        "raw_text": "Acme Corp"
      }
    },
    "validation": {
      "is_valid": true,
      "hard_violations": [],
      "soft_violations": []
    }
  }
}

Complete Example: Capital Call

Here’s a full response for a capital call extraction:

{
  "job_id": "660e8400-e29b-41d4-a716-446655440001",
  "document_id": "7890abc1-def2-5678-9012-345678901234",
  "status": "completed",
  "classification": {
    "document_type": "capital_call",
    "confidence": 0.95,
    "reasoning": "Document is a capital call notice from a private equity fund requesting capital contribution from limited partners.",
    "citation": "CAPITAL CALL NOTICE - Fund IV, L.P.",
    "citation_page": 1
  },
  "extraction": {
    "document_type": "capital_call",
    "page_count": 3,
    "extraction_model": "google-vertex:gemini-2.5-flash",
    "processing_time_ms": 4200,
    "data": {
      "fund_name": "ABC Partners Fund IV, L.P.",
      "call_reference": "CC-2024-Q4-001",
      "notice_date": "2024-12-01",
      "due_date": "2024-12-15",
      "call_amount_lp": 4500000.00,
      "call_amount_fund": 50000000.00,
      "lp_ownership_percentage": 9.0,
      "investment_amount": 4200000.00,
      "management_fee_amount": 250000.00,
      "other_expenses_amount": 50000.00,
      "bank_name": "Silicon Valley Bank",
      "account_number": "****1234",
      "routing_number": "121000248",
      "swift_code": "SVBKUS6S",
      "wire_reference": "ABC Fund IV - CC-2024-Q4-001"
    },
    "field_metadata": {
      "fund_name": {
        "confidence": 0.99,
        "page_number": 1,
        "location_hint": "top of page, main header",
        "raw_text": "ABC Partners Fund IV, L.P."
      },
      "call_amount_lp": {
        "confidence": 0.96,
        "page_number": 1,
        "location_hint": "summary table, highlighted row",
        "raw_text": "$4,500,000.00"
      },
      "due_date": {
        "confidence": 0.98,
        "page_number": 1,
        "location_hint": "prominently displayed below header",
        "raw_text": "Payment Due: December 15, 2024"
      },
      "bank_name": {
        "confidence": 0.97,
        "page_number": 2,
        "location_hint": "wire instructions section",
        "raw_text": "Silicon Valley Bank"
      },
      "swift_code": {
        "confidence": 0.94,
        "page_number": 2,
        "location_hint": "wire instructions section",
        "raw_text": "SWIFT: SVBKUS6S"
      }
    },
    "validation": {
      "is_valid": true,
      "hard_violations": [],
      "soft_violations": [
        {
          "field": "call_amount_calculation",
          "severity": "soft",
          "message": "LP call amount ($4,500,000) does not exactly match fund call ($50,000,000) × ownership (9.0%) = $4,500,000. Difference: $0 (within tolerance)."
        }
      ]
    }
  }
}

Error Handling

Job Not Completed

If you try to get results before the job completes:

{
  "error": "job_not_completed",
  "message": "Job not completed. Current status: processing. Results are only available for completed jobs."
}

HTTP Status: 400 Bad Request Fix: Wait for the job to complete or use webhooks for notifications.

Job Not Found

{
  "error": "not_found",
  "message": "Job not found: 550e8400-e29b-41d4-a716-446655440000. It may not exist or you may not have access to it."
}

HTTP Status: 404 Not Found Possible Causes:

Job ID does not exist
Job belongs to a different tenant
Typo in the job ID

Best Practices

Check Confidence Scores

Review field-level confidence scores for critical data. Fields with low confidence may need manual verification.

Use Page Numbers

The page_number and location_hint help you quickly locate and verify extracted values in the original PDF.

Handle Soft Violations

Soft violations are warnings, not errors. They may indicate missing optional fields or minor inconsistencies.

Log Validation Failures

When is_valid is false, log the hard_violations for debugging and quality monitoring.

Next Steps

Views Guide

Learn how to create custom views to filter extracted data

Webhook Setup

Get notified when extraction completes

Document Types

Browse all supported document types and their schemas

Error Handling

Handle extraction failures gracefully

Getting Started

Core Concepts

Guides

SDKs & Examples

Resources

Understanding Extraction Results

What You Get After Processing

Getting Full Extraction Results

Understanding Classification

Classification Fields

Example Classification

Understanding Extracted Data

Extraction Fields

Example Extraction (Invoice)

Understanding Field Metadata

Example Field Metadata

Understanding Validation Results

Validation Types

Example Validation (Passing)

Example Validation (Failing)

Complete Example: Invoice

Complete Example: Capital Call

Error Handling

Job Not Completed

Job Not Found

Best Practices

Check Confidence Scores

Use Page Numbers

Handle Soft Violations

Log Validation Failures

Next Steps

Views Guide

Webhook Setup

Document Types

Error Handling

Getting Started

Core Concepts

Guides

SDKs & Examples

Resources

​What You Get After Processing

​Getting Full Extraction Results

​Understanding Classification

​Classification Fields

​Example Classification

​Understanding Extracted Data

​Extraction Fields

​Example Extraction (Invoice)

​Understanding Field Metadata

​Example Field Metadata

​Understanding Validation Results

​Validation Types

​Example Validation (Passing)

​Example Validation (Failing)

​Complete Example: Invoice

​Complete Example: Capital Call

​Error Handling

​Job Not Completed

​Job Not Found

​Best Practices

Check Confidence Scores

Use Page Numbers

Handle Soft Violations

Log Validation Failures

​Next Steps

Views Guide

Webhook Setup

Document Types

Error Handling

What You Get After Processing

Getting Full Extraction Results

Understanding Classification

Classification Fields

Example Classification

Understanding Extracted Data

Extraction Fields

Example Extraction (Invoice)

Understanding Field Metadata

Example Field Metadata

Understanding Validation Results

Validation Types

Example Validation (Passing)

Example Validation (Failing)

Complete Example: Invoice

Complete Example: Capital Call

Error Handling

Job Not Completed

Job Not Found

Best Practices

Next Steps