How table extraction works
The conversion engine analyses each PDF page for grid-like structures — visible cell borders, consistent column gutters, repeated row spacing — and reconstructs them as rows and columns in an Excel sheet. Each PDF page typically becomes one worksheet (or one section within a worksheet) in the output workbook.
What works best
- Tables with visible borders and consistent column widths
- Bank statements, invoices and financial reports
- Single-table-per-page PDFs (cleanest output)
- Digitally-generated PDFs with a text layer
What needs cleanup
- Borderless tables — columns may merge or split unpredictably
- Tables that span multiple pages — usually rebuilt per page, not joined
- Merged cells, multi-row headers and footnotes — best-effort
- Scanned PDFs (no text layer) — needs OCR first
Why no formulas?
A PDF stores the visible values shown on the page, not the formulas that produced them. The converted Excel file therefore contains the same numbers as the PDF, as plain values — useful for further analysis, but not editable as live formulas.