Skip to main content

When you need more control

The default PDF parsing settings work well for most documents. But when you’re working with complex PDFs — nested layouts, handwriting, large multi-page files, or production flows where consistency matters — the advanced settings give you more precise control.

Text parsing approach

Controls how Parabola reads the text out of your PDF before the AI processes it.
Screenshot2026 02 27at2 38 18PM
Parabola selects the best parsing method based on the document. This is the right choice for most situations — start here and only change it if you’re seeing issues with the output.

Page filtering

Default: disabled
If your document has a consistent structure — for example, header info always on page 1 and line items always on pages 2–3 — locking the step to those pages eliminates unnecessary processing time on every run.
If you only need data from specific pages, page filtering can meaningfully speed up your runs. The fewer pages the AI needs to read, the faster it processes.
Screenshot 2026 02 27 At 2 40 15 PM
When enabled, configure two things: Action:
OptionWhen to use
KeepParse only the pages you specify
RemoveParse everything except the pages you specify
AutodetectLet Parabola choose the most relevant pages
Which pages:
OptionHow it works
The first N pagesInput a number — parses that many pages from the start
The last N pagesInput a number — parses that many pages from the end
These pagesInput a comma-separated list, e.g., 1, 3, 5

Tips & best practices

PDFs must be under 500MB and 30 pages or fewer. Files that exceed either limit will not process.
Password-protected PDFs cannot be parsed. Ask your vendor to remove the password before sending, or build a decryption step upstream.
The more pages the AI reads, the longer the run. If your data appears on consistent pages across documents, use Page Filtering above to limit scope. Fewer pages means faster results — especially at scale.
Each Extract from email step extracts one table at a time. If you need data from multiple distinct tables in the same PDF, you’ll need a separate step for each.
AI extraction is powerful but not infallible — especially with complex or inconsistently formatted documents. Test with a few representative sample files and review the output in Parabola before connecting to a production destination.
One file is processed per run. If you’re parsing many PDFs over time, connect your flow to a Parabola Table or external sheet so results accumulate rather than being overwritten each run.
When you enable Email content + attachment and pull in the file_url, you get a shareable link to the original PDF. Include it in your destination data so you can always trace any row back to its source document.
If a specific column or key isn’t returning the right data, the fix usually lives at the field level — not in the general Fine-tuning section. Add descriptions, example values, and targeted instructions directly on the underperforming column or key.

Quick reference

ConstraintLimit
Max file size500MB
Max pages30
Max emails in queue1,000
Files processed per run1
Tables extractable per step1
Password-protected PDFsNot supported

What’s next

Next, we’ll look at how to handle emails that contain more than one type of PDF attachment — like an invoice and a purchase order arriving together.

Building challenge

No challenge for this lesson! Proceed to the next lesson. 
Last modified on March 5, 2026