> ## Documentation Index > Fetch the complete documentation index at: https://parabola.io/docs/llms.txt > Use this file to discover all available pages before exploring further. # PDF advanced settings > Fine-tune how Parabola parses PDFs with text parsing modes, page filtering, and automatic retry logic. ## When you need more control The default PDF parsing settings work well for most documents. But when you're working with complex PDFs — nested layouts, handwriting, large multi-page files, or production flows where consistency matters — the advanced settings give you more precise control. *** ## Text parsing approach Controls how Parabola reads the text out of your PDF before the AI processes it. Screenshot2026 02 27at2 38 18PM

Parabola selects the best parsing method based on the document. This is the right choice for most situations — start here and only change it if you're seeing issues with the output. Uses a more advanced optical character recognition model. Better for: * Documents with handwriting * Scanned PDFs where text isn't machine-readable * Images embedded inside PDFs OCR mode uses a more sophisticated model and will run slower than Auto or Markdown. Use it only when Auto isn't producing clean results. Parses the PDF by converting it to Markdown format first. Generally faster than OCR, and tends to work better for: * PDFs with nested columns and rows * Documents where the visual layout carries meaning * Well-structured digital PDFs (not scans) *** ## Page filtering **Default: disabled** If your document has a consistent structure — for example, header info always on page 1 and line items always on pages 2–3 — locking the step to those pages eliminates unnecessary processing time on every run. If you only need data from specific pages, page filtering can meaningfully speed up your runs. The fewer pages the AI needs to read, the faster it processes. Screenshot 2026 02 27 At 2 40 15 PM

When enabled, configure two things: **Action:** | Option | When to use | | -------------- | --------------------------------------------- | | **Keep** | Parse only the pages you specify | | **Remove** | Parse everything except the pages you specify | | **Autodetect** | Let Parabola choose the most relevant pages | **Which pages:** | Option | How it works | | --------------------- | ------------------------------------------------------ | | **The first N pages** | Input a number — parses that many pages from the start | | **The last N pages** | Input a number — parses that many pages from the end | | **These pages** | Input a comma-separated list, e.g., `1, 3, 5` | *** ## Tips & best practices PDFs must be **under 500MB** and **30 pages or fewer**. Files that exceed either limit will not process. **Password-protected PDFs cannot be parsed.** Ask your vendor to remove the password before sending, or build a decryption step upstream. The more pages the AI reads, the longer the run. If your data appears on consistent pages across documents, use **Page Filtering** above to limit scope. Fewer pages means faster results — especially at scale. Each Extract from email step extracts **one table** at a time. If you need data from multiple distinct tables in the same PDF, you'll need a separate step for each. AI extraction is powerful but not infallible — especially with complex or inconsistently formatted documents. Test with a few representative sample files and review the output in Parabola before connecting to a production destination. One file is processed per run. If you're parsing many PDFs over time, connect your flow to a **Parabola Table** or external sheet so results accumulate rather than being overwritten each run. When you enable **Email content + attachment** and pull in the `file_url`, you get a shareable link to the original PDF. Include it in your destination data so you can always trace any row back to its source document. If a specific column or key isn't returning the right data, the fix usually lives at the field level — not in the general Fine-tuning section. Add descriptions, example values, and targeted instructions directly on the underperforming column or key. ### Quick reference | Constraint | Limit | | --------------------------- | ------------- | | Max file size | 500MB | | Max pages | 30 | | Max emails in queue | 1,000 | | Files processed per run | 1 | | Tables extractable per step | 1 | | Password-protected PDFs | Not supported | *** ## What's next Next, we'll look at how to handle emails that contain more than one type of PDF attachment — like an invoice and a purchase order arriving together. No challenge for this lesson! Proceed to the next lesson.