> ## Documentation Index
> Fetch the complete documentation index at: https://parabola.io/docs/llms.txt
> Use this file to discover all available pages before exploring further.

# Extracting from PDFs

> Next, we'll use the Extract from email step to digitize PDF data 

## Why PDFs are different

PDFs are designed for reading, not data processing. There's no structured row-column format to parse directly — which means extracting data from them requires a different approach than CSV or Excel.

Parabola uses AI to read your PDF and translate it into a clean, structured table. Your job is to guide that process by telling the step what data to look for and where to find it.

<Note>
  Parabola can extract data from **handwritten documents** as well. Legibility matters — heavily stylized or unclear handwriting may require additional instructions and iteration to get right.
</Note>

***

## Two types of data to extract

Before configuring the step, it helps to think about the data within your PDF in terms of two categories:

| Type                         | Description                                    | Example                                           |
| ---------------------------- | ---------------------------------------------- | ------------------------------------------------- |
| **Columns** (table data)     | Repeating values that span multiple rows       | Line items, quantities, unit prices, part numbers |
| **Keys** (individual values) | Single values that apply to the whole document | Invoice number, PO number, vendor name, date      |

***

## Setting up the step

Set the **file attachment type** dropdown to **PDF (with AI)**. The panel will update to show PDF-specific configuration options.

***

## Extracting a table

Expand the **Extract a table** section. You have three modes to choose from:

<Tabs>
  <Tab title="Auto-detected table (default)">
    Parabola scans the PDF and automatically identifies the most likely table, labeling its columns. This is the fastest way to start and works well for clear, consistently structured PDFs.

    After sending an email with a PDF, use the **"Use an auto-detected table"** dropdown to review all tables Parabola found in your document. If a column is missing, you can add it manually.

    <Info>
      Use this default option first, and only change if it's not giving you your desired results after some iteration.
    </Info>
  </Tab>

  <Tab title="Define a custom table">
    You manually define the columns and describe the table. Best for:

    * Documents with more than one table
    * Tables that span multiple pages
    * Cases where auto-detect isn't finding the right data

    For each column, you can provide a name, example values, and additional instructions. The name doesn't need to match what the PDF says — it just needs to be something Parabola can associate with the right data.
  </Tab>

  <Tab title="Extract all data (OCR-first)">
    Uses OCR to return all text from the PDF as raw output — no AI structuring applied. This is a last resort option, best used when you plan to pipe the raw output into an AI step for further processing.

    Returns data in one of four formats:

    * **All data** — every value, one per row
    * **Table data** — tables split by page, each tagged with a table ID
    * **Key-value pairs** — labeled items like "Invoice #: 4821"
    * **Raw text** — one cell per page

    <Warning>
      Only use OCR-first mode if the first two options aren't producing usable results. The output is unstructured and will require additional steps to work with.
    </Warning>
  </Tab>
</Tabs>

***

## Extracting individual values (keys)

Document-level values like invoice date, PO number, and total amount go in the **Extract individual values** section.

<Steps>
  <Step title="Click '+ Add key'">
    Add a new value definition.
  </Step>

  <Step title="Give it a name">
    The name doesn't need to match the PDF exactly — just make it descriptive enough that Parabola can identify the right field.
  </Step>

  <Step title="Add example values and instructions if needed">
    If the value is straightforward (e.g., "Invoice Date"), a name alone may be enough. For ambiguous fields, add an example value or extra context in the instructions field. **The more example values and instructions  you provide, the more accurate your results will be.**
  </Step>
</Steps>

<Frame caption="Example of field-level-fine-tuning in an Extract from email step">
  <Frame>
    <img src="https://mintcdn.com/parabola-7119dfb0/6Jf2AKOqreDrivib/images/Screenshot2026-02-27at2.32.57PM.png?fit=max&auto=format&n=6Jf2AKOqreDrivib&q=85&s=28a2a7673fb114fa5af4efe6fd6d54d5" alt="Screenshot 2026 02 27 At 2 32 57 PM" width="1200" height="1128" data-path="images/Screenshot2026-02-27at2.32.57PM.png" />
  </Frame>
</Frame>

<Tip>
  Each key you define is **repeated on every row** of the resulting dataset. This makes it easy to join line-item data with document-level metadata downstream — for example, linking each invoice line item to its invoice number and date.
</Tip>

***

## Fine-tuning

At the bottom of the panel is a **Fine-tuning** section. Use it to give the AI general context about the document or the expected output — things that aren't specific to a single column or key.

<Tip>
  Fine-tuning is for **general context**. Column-specific guidance (like "this field uses a non-standard date format" or "values here may include both letters and numbers") belongs in each column's own **Additional Instructions** field. Targeted instructions yield better results than general ones.
</Tip>

***

## Sending in data

Once you've applied some preliminary settings, you're ready to email a PDF to your Parabola step.

<Note>
  Don't worry about a 100% perfect setup before sending in your PDF. It's easiest to apply some lightweight settings (ex. set data type to PDF with AI, make sure it's set to use an auto-detected table, and add some keys), and then do additional fine-tuning after the PDF is pulled in.
</Note>

***

## What's next

You know how to configure basic PDF extraction. In the next lesson, we'll cover the advanced settings — text parsing modes, page filtering, and retry behavior — for handling more complex documents.

***

<Card icon="sparkles" title="Building challenge">
  Click [here](https://parabola.io/api/clipboard/42ac0e02-04ed-4f0e-929b-c358d2e5a0e6/copy_to_flow?name=Parabola+University:+Extracting+from+PDFs) to create a fresh flow. Then...

  <Steps>
    <Step title="Add an Extract from email step to the canvas">
      Drop the step directly on the card in your flow
    </Step>

    <Step title="Change the settings to &#x22;PDF (with AI)&#x22; → &#x22;Extract a table&#x22; → &#x22;Use an auto-detected table&#x22;">
      Leave the "Columns" section section blank (AI will fill this out)
    </Step>

    <Step title="In the &#x22;Keys&#x22; section, add the following values:">
      * Invoice Number
      * PO Number
      * Date
      * Signer
    </Step>

    <Step title="Send an email to the address in your step">
      Include the  PDF from lesson 1 as an attachment
    </Step>

    <Step title="Fine-tune the results">
      If any fields didn't parse properly (like Signer), try adding some example values and/or field-level instructions
    </Step>
  </Steps>
</Card>

<Accordion title="Check your work">
  <Frame>
    <img src="https://mintcdn.com/parabola-7119dfb0/6Jf2AKOqreDrivib/images/Screenshot2026-02-27at2.28.52PM.png?fit=max&auto=format&n=6Jf2AKOqreDrivib&q=85&s=195755585fe6ff23320551faba649beb" alt="Screenshot2026 02 27at2 28 52PM" width="3128" height="1914" data-path="images/Screenshot2026-02-27at2.28.52PM.png" />
  </Frame>

  <Frame>
    <img src="https://mintcdn.com/parabola-7119dfb0/6Jf2AKOqreDrivib/images/Screenshot2026-02-27at2.29.33PM.png?fit=max&auto=format&n=6Jf2AKOqreDrivib&q=85&s=b124ade02be0f88e4f5fda6dec4cdba5" alt="Screenshot2026 02 27at2 29 33PM" width="3130" height="1914" data-path="images/Screenshot2026-02-27at2.29.33PM.png" />
  </Frame>

  <Frame>
    <img src="https://mintcdn.com/parabola-7119dfb0/6Jf2AKOqreDrivib/images/Screenshot2026-02-27at2.29.50PM.png?fit=max&auto=format&n=6Jf2AKOqreDrivib&q=85&s=f78c2f4be99c47135fbde1db65b9e720" alt="Screenshot2026 02 27at2 29 50PM" width="3128" height="1916" data-path="images/Screenshot2026-02-27at2.29.50PM.png" />
  </Frame>
</Accordion>
