> ## Documentation Index
> Fetch the complete documentation index at: https://parabola.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# PDF advanced settings

> Fine-tune how Parabola parses PDFs with text parsing modes, page filtering, and automatic retry logic.

## When you need more control

The default PDF parsing settings work well for most documents. But when you're working with complex PDFs — nested layouts, handwriting, large multi-page files, or production flows where consistency matters — the advanced settings give you more precise control.

***

## Text parsing approach

Controls how Parabola reads the text out of your PDF before the AI processes it.

<Frame>
  <img src="https://mintcdn.com/parabola-7119dfb0/6Jf2AKOqreDrivib/images/Screenshot2026-02-27at2.38.18PM.png?fit=max&auto=format&n=6Jf2AKOqreDrivib&q=85&s=55740dc66d301ef9d956a586d0e12509" alt="Screenshot2026 02 27at2 38 18PM" width="1036" height="744" data-path="images/Screenshot2026-02-27at2.38.18PM.png" />
</Frame>

<Tabs>
  <Tab title="Auto (default)">
    Parabola selects the best parsing method based on the document. This is the right choice for most situations — start here and only change it if you're seeing issues with the output.
  </Tab>

  <Tab title="OCR">
    Uses a more advanced optical character recognition model. Better for:

    * Documents with handwriting
    * Scanned PDFs where text isn't machine-readable
    * Images embedded inside PDFs

    <Note>
      OCR mode uses a more sophisticated model and will run slower than Auto or Markdown. Use it only when Auto isn't producing clean results.
    </Note>
  </Tab>

  <Tab title="Markdown">
    Parses the PDF by converting it to Markdown format first. Generally faster than OCR, and tends to work better for:

    * PDFs with nested columns and rows
    * Documents where the visual layout carries meaning
    * Well-structured digital PDFs (not scans)
  </Tab>
</Tabs>

***

## Page filtering

**Default: disabled**

<Tip>
  If your document has a consistent structure — for example, header info always on page 1 and line items always on pages 2–3 — locking the step to those pages eliminates unnecessary processing time on every run.
</Tip>

If you only need data from specific pages, page filtering can meaningfully speed up your runs. The fewer pages the AI needs to read, the faster it processes.

<Frame>
  <img src="https://mintcdn.com/parabola-7119dfb0/6Jf2AKOqreDrivib/images/Screenshot2026-02-27at2.40.15PM.png?fit=max&auto=format&n=6Jf2AKOqreDrivib&q=85&s=d450593de9daa6904f22500d467f8e46" alt="Screenshot 2026 02 27 At 2 40 15 PM" width="816" height="448" data-path="images/Screenshot2026-02-27at2.40.15PM.png" />
</Frame>

When enabled, configure two things:

**Action:**

| Option         | When to use                                   |
| -------------- | --------------------------------------------- |
| **Keep**       | Parse only the pages you specify              |
| **Remove**     | Parse everything except the pages you specify |
| **Autodetect** | Let Parabola choose the most relevant pages   |

**Which pages:**

| Option                | How it works                                           |
| --------------------- | ------------------------------------------------------ |
| **The first N pages** | Input a number — parses that many pages from the start |
| **The last N pages**  | Input a number — parses that many pages from the end   |
| **These pages**       | Input a comma-separated list, e.g., `1, 3, 5`          |

***

## Tips & best practices

<Warning>
  PDFs must be **under 500MB** and **30 pages or fewer**. Files that exceed either limit will not process.
</Warning>

<Warning>
  **Password-protected PDFs cannot be parsed.** Ask your vendor to remove the password before sending, or build a decryption step upstream.
</Warning>

<AccordionGroup>
  <Accordion title="Parse only the pages you need">
    The more pages the AI reads, the longer the run. If your data appears on consistent pages across documents, use **Page Filtering** above to limit scope. Fewer pages means faster results — especially at scale.
  </Accordion>

  <Accordion title="One table per step">
    Each Extract from email step extracts **one table** at a time. If you need data from multiple distinct tables in the same PDF, you'll need a separate step for each.
  </Accordion>

  <Accordion title="Always audit your output before going live">
    AI extraction is powerful but not infallible — especially with complex or inconsistently formatted documents. Test with a few representative sample files and review the output in Parabola before connecting to a production destination.
  </Accordion>

  <Accordion title="Consolidate results across multiple runs">
    One file is processed per run. If you're parsing many PDFs over time, connect your flow to a **Parabola Table** or external sheet so results accumulate rather than being overwritten each run.
  </Accordion>

  <Accordion title="Use the file URL for audit trails">
    When you enable **Email content + attachment** and pull in the `file_url`, you get a shareable link to the original PDF. Include it in your destination data so you can always trace any row back to its source document.
  </Accordion>

  <Accordion title="Use field-level instructions for better results">
    If a specific column or key isn't returning the right data, the fix usually lives at the field level — not in the general Fine-tuning section. Add descriptions, example values, and targeted instructions directly on the underperforming column or key.
  </Accordion>
</AccordionGroup>

### Quick reference

| Constraint                  | Limit         |
| --------------------------- | ------------- |
| Max file size               | 500MB         |
| Max pages                   | 30            |
| Max emails in queue         | 1,000         |
| Files processed per run     | 1             |
| Tables extractable per step | 1             |
| Password-protected PDFs     | Not supported |

***

## What's next

Next, we'll look at how to handle emails that contain more than one type of PDF attachment — like an invoice and a purchase order arriving together.

<Card icon="sparkles" title="Building challenge">
  No challenge for this lesson! Proceed to the next lesson.
</Card>
