> ## Documentation Index > Fetch the complete documentation index at: https://parabola.io/docs/llms.txt > Use this file to discover all available pages before exploring further. # Extracting from PDFs > Next, we'll use the Extract from email step to digitize PDF data ## Why PDFs are different PDFs are designed for reading, not data processing. There's no structured row-column format to parse directly — which means extracting data from them requires a different approach than CSV or Excel. Parabola uses AI to read your PDF and translate it into a clean, structured table. Your job is to guide that process by telling the step what data to look for and where to find it. Parabola can extract data from **handwritten documents** as well. Legibility matters — heavily stylized or unclear handwriting may require additional instructions and iteration to get right. *** ## Two types of data to extract Before configuring the step, it helps to think about the data within your PDF in terms of two categories: | Type | Description | Example | | ---------------------------- | ---------------------------------------------- | ------------------------------------------------- | | **Columns** (table data) | Repeating values that span multiple rows | Line items, quantities, unit prices, part numbers | | **Keys** (individual values) | Single values that apply to the whole document | Invoice number, PO number, vendor name, date | *** ## Setting up the step Set the **file attachment type** dropdown to **PDF (with AI)**. The panel will update to show PDF-specific configuration options. *** ## Extracting a table Expand the **Extract a table** section. You have three modes to choose from: Parabola scans the PDF and automatically identifies the most likely table, labeling its columns. This is the fastest way to start and works well for clear, consistently structured PDFs. After sending an email with a PDF, use the **"Use an auto-detected table"** dropdown to review all tables Parabola found in your document. If a column is missing, you can add it manually. Use this default option first, and only change if it's not giving you your desired results after some iteration. You manually define the columns and describe the table. Best for: * Documents with more than one table * Tables that span multiple pages * Cases where auto-detect isn't finding the right data For each column, you can provide a name, example values, and additional instructions. The name doesn't need to match what the PDF says — it just needs to be something Parabola can associate with the right data. Uses OCR to return all text from the PDF as raw output — no AI structuring applied. This is a last resort option, best used when you plan to pipe the raw output into an AI step for further processing. Returns data in one of four formats: * **All data** — every value, one per row * **Table data** — tables split by page, each tagged with a table ID * **Key-value pairs** — labeled items like "Invoice #: 4821" * **Raw text** — one cell per page Only use OCR-first mode if the first two options aren't producing usable results. The output is unstructured and will require additional steps to work with. *** ## Extracting individual values (keys) Document-level values like invoice date, PO number, and total amount go in the **Extract individual values** section. Add a new value definition. The name doesn't need to match the PDF exactly — just make it descriptive enough that Parabola can identify the right field. If the value is straightforward (e.g., "Invoice Date"), a name alone may be enough. For ambiguous fields, add an example value or extra context in the instructions field. **The more example values and instructions you provide, the more accurate your results will be.** Screenshot 2026 02 27 At 2 32 57 PM

Each key you define is **repeated on every row** of the resulting dataset. This makes it easy to join line-item data with document-level metadata downstream — for example, linking each invoice line item to its invoice number and date. *** ## Fine-tuning At the bottom of the panel is a **Fine-tuning** section. Use it to give the AI general context about the document or the expected output — things that aren't specific to a single column or key. Fine-tuning is for **general context**. Column-specific guidance (like "this field uses a non-standard date format" or "values here may include both letters and numbers") belongs in each column's own **Additional Instructions** field. Targeted instructions yield better results than general ones. *** ## Sending in data Once you've applied some preliminary settings, you're ready to email a PDF to your Parabola step. Don't worry about a 100% perfect setup before sending in your PDF. It's easiest to apply some lightweight settings (ex. set data type to PDF with AI, make sure it's set to use an auto-detected table, and add some keys), and then do additional fine-tuning after the PDF is pulled in. *** ## What's next You know how to configure basic PDF extraction. In the next lesson, we'll cover the advanced settings — text parsing modes, page filtering, and retry behavior — for handling more complex documents. *** Click [here](https://parabola.io/api/clipboard/42ac0e02-04ed-4f0e-929b-c358d2e5a0e6/copy_to_flow?name=Parabola+University:+Extracting+from+PDFs) to create a fresh flow. Then... Drop the step directly on the card in your flow Leave the "Columns" section section blank (AI will fill this out) * Invoice Number * PO Number * Date * Signer Include the PDF from lesson 1 as an attachment If any fields didn't parse properly (like Signer), try adding some example values and/or field-level instructions Screenshot2026 02 27at2 28 52PM