After DocIntell processes your document, you receive two types of results: classification (what type of document it is) and extraction (the actual data pulled from the document). This guide explains how to retrieve and understand these results.Documentation Index
Fetch the complete documentation index at: https://docs.docintell.com/llms.txt
Use this file to discover all available pages before exploring further.
What You Get After Processing
Once a document completes processing (status:completed), you have access to:
- Classification - What type of document was detected and why
- Extraction - The actual data extracted from the document
- Field Metadata - Confidence scores, page numbers, and source locations for each field
- Validation Results - Whether the extraction passed validation rules
Getting Full Extraction Results
UseGET /v1/jobs/{job_id}/results to retrieve the complete extraction output for a job:
When to use
/results vs /documents/{id}/data:- Use
/jobs/{job_id}/resultsfor full extraction output with all metadata - Use
/documents/{id}/datafor filtered data based on your custom views (see Views Guide)
Understanding Classification
The classification tells you what type of document was detected and why.Classification Fields
The detected document type code (e.g.,
invoice, capital_call, k1)Classification confidence score from 0.0 to 1.0 (higher is more confident)
1-2 sentence explanation of why this type was chosen
Direct quote from the document that supports the classification
Page number where the citation was found (1-indexed)
Example Classification
How is the document type determined?
How is the document type determined?
DocIntell uses a multi-stage classification process:
- Visual analysis - Layout, headers, and document structure
- Text analysis - Key phrases, terminology, and language patterns
- Template matching - Common document formats (W-9, K-1, invoices, etc.)
Understanding Extracted Data
The extraction contains the actual data pulled from your document.Extraction Fields
Document type (matches classification)
Number of pages in the document
LLM model used for extraction (e.g.,
google-vertex:gemini-2.5-flash)How long extraction took in milliseconds
The extracted fields as key-value pairs. Field names are in
snake_case.Per-field metadata including confidence scores, page numbers, and source locations
Validation results with hard/soft violations
Example Extraction (Invoice)
Understanding Field Metadata
Field metadata provides provenance and confidence information for each extracted field.Self-reported confidence from the LLM (0.0 to 1.0). Directionally useful but not calibrated - a 90% confidence does not mean 90% accuracy.
Page where the value was found (1-indexed). Useful for manual verification.
Qualitative description of where on the page (e.g., “top header”, “in summary table”, “footer”)
The original text as it appeared in the document before parsing
Example Field Metadata
Confidence Score Guidelines:
- 0.95+ - Very high confidence (rarely wrong)
- 0.85-0.94 - High confidence (generally reliable)
- 0.70-0.84 - Moderate confidence (worth verifying)
- Below 0.70 - Low confidence (manual review recommended)
Understanding Validation Results
Validation checks whether the extracted data meets expected constraints.Validation Types
- Hard Violations - Critical errors that indicate extraction failure
- Soft Violations - Warnings that may require attention but don’t fail the extraction
true if all hard constraints passed, false if any hard violations existList of critical validation failures
List of warnings or optional field issues
Example Validation (Passing)
Example Validation (Failing)
Complete Example: Invoice
Here’s a full response for an invoice extraction:Complete Example: Capital Call
Here’s a full response for a capital call extraction:Error Handling
Job Not Completed
If you try to get results before the job completes:400 Bad Request
Fix: Wait for the job to complete or use webhooks for notifications.
Job Not Found
404 Not Found
Possible Causes:
- Job ID does not exist
- Job belongs to a different tenant
- Typo in the job ID
Best Practices
Check Confidence Scores
Review field-level confidence scores for critical data. Fields with low confidence may need manual verification.
Use Page Numbers
The
page_number and location_hint help you quickly locate and verify extracted values in the original PDF.Handle Soft Violations
Soft violations are warnings, not errors. They may indicate missing optional fields or minor inconsistencies.
Log Validation Failures
When
is_valid is false, log the hard_violations for debugging and quality monitoring.Next Steps
Views Guide
Learn how to create custom views to filter extracted data
Webhook Setup
Get notified when extraction completes
Document Types
Browse all supported document types and their schemas
Error Handling
Handle extraction failures gracefully