Pipeline
Four stages, each independently retryable. A flaky OCR pass never costs the whole document.
PDFs through PyMuPDF, scans through Tesseract with OpenCV preprocessing, or straight through Gemini multimodal. Handles photographed receipts.
Invoice number, dates, vendor, line items, totals, currency, notes. Each with confidence + source-text citation. Indonesian + English labels.
Fully offline with Llama 3 GGUF, hybrid with Tesseract + Gemini text, or full multimodal. One env var to switch.
JSON for pipelines, CSV for spreadsheets, XLSX for finance. One click. No transformation step.
AI_MODE
The same twelve fields, three execution paths. Switch by flipping one environment variable.
| Mode | Pipeline | Network | Accuracy | Latency |
|---|---|---|---|---|
local | Tesseract → Llama 3 GGUF | None | Medium | ~20 s |
hybrid | Tesseract → Gemini text | OCR text only | High | ~5 s |
api | Gemini multimodal | Full document | Highest | ~8 s |
Schema
Each value comes with a confidence score (0–1) and the source text that supports it. Numeric fields are coerced, dates are ISO-normalized, totals are cross-checked against quantity × unit price.