PDF advanced settings

When you need more control

The default PDF parsing settings work well for most documents. But when you’re working with complex PDFs — nested layouts, handwriting, large multi-page files, or production flows where consistency matters — the advanced settings give you more precise control.

Text parsing approach

Controls how Parabola reads the text out of your PDF before the AI processes it.

Auto (default)
OCR
Markdown

Parabola selects the best parsing method based on the document. This is the right choice for most situations — start here and only change it if you’re seeing issues with the output.

Page filtering

Default: disabled

If your document has a consistent structure — for example, header info always on page 1 and line items always on pages 2–3 — locking the step to those pages eliminates unnecessary processing time on every run.

If you only need data from specific pages, page filtering can meaningfully speed up your runs. The fewer pages the AI needs to read, the faster it processes.

When enabled, configure two things: Action:

Option	When to use
Keep	Parse only the pages you specify
Remove	Parse everything except the pages you specify
Autodetect	Let Parabola choose the most relevant pages

Which pages:

Option	How it works
The first N pages	Input a number — parses that many pages from the start
The last N pages	Input a number — parses that many pages from the end
These pages	Input a comma-separated list, e.g., `1, 3, 5`

Tips & best practices

PDFs must be under 500MB and 30 pages or fewer. Files that exceed either limit will not process.

Password-protected PDFs cannot be parsed. Ask your vendor to remove the password before sending, or build a decryption step upstream.

Parse only the pages you need

The more pages the AI reads, the longer the run. If your data appears on consistent pages across documents, use Page Filtering above to limit scope. Fewer pages means faster results — especially at scale.

One table per step

Each Extract from email step extracts one table at a time. If you need data from multiple distinct tables in the same PDF, you’ll need a separate step for each.

Always audit your output before going live

AI extraction is powerful but not infallible — especially with complex or inconsistently formatted documents. Test with a few representative sample files and review the output in Parabola before connecting to a production destination.

Consolidate results across multiple runs

One file is processed per run. If you’re parsing many PDFs over time, connect your flow to a Parabola Table or external sheet so results accumulate rather than being overwritten each run.

Use the file URL for audit trails

When you enable Email content + attachment and pull in the file_url, you get a shareable link to the original PDF. Include it in your destination data so you can always trace any row back to its source document.

Use field-level instructions for better results

If a specific column or key isn’t returning the right data, the fix usually lives at the field level — not in the general Fine-tuning section. Add descriptions, example values, and targeted instructions directly on the underperforming column or key.

Quick reference

Constraint	Limit
Max file size	500MB
Max pages	30
Max emails in queue	1,000
Files processed per run	1
Tables extractable per step	1
Password-protected PDFs	Not supported

What’s next

Next, we’ll look at how to handle emails that contain more than one type of PDF attachment — like an invoice and a purchase order arriving together.

Building challenge

No challenge for this lesson! Proceed to the next lesson.

Last modified on March 30, 2026

Extracting from PDFs

Parsing multi-format PDFs

⌘I

Overview

Courses

PDF advanced settings

When you need more control

Text parsing approach

Page filtering

Tips & best practices

Quick reference

What’s next

Building challenge

​When you need more control

​Text parsing approach

​Page filtering

​Tips & best practices

​Quick reference

​What’s next

Building challenge

When you need more control

Text parsing approach

Page filtering

Tips & best practices

Quick reference

What’s next