Pulling data - bonus lesson

Pulling from PDFs

In this lesson, we tackle one of the more challenging aspects of data automation — turning unstructured PDF data into nice, clean data tables. 

Looking for a sample PDF? Click here.

About Parabola’s PDF steps  

Parabola isn’t just another PDF parser. Parabola’s approach to PDF parsing is unique for a few reasons:

  1. The tool was built to handle variations that exist between document formats. 
  2. Our Pull from PDF file and Pull from inbound email steps pull data from documents using a blend of PDF scraping and a computer vision-enabled large language model (LLM).
  3. Parsing can be easily set up by simply typing in what you want to extract, and it doesn’t need the documents to be in the exact same format every time.
  4. Since it’s connected to the rest of Parabola, you can do additional data transformations beyond just pulling data off of a page. 

How to use the step

As you get started working with PDFs in Parabola, here are some core concepts to understand: 

  1. Pull PDFs from emails or static files: You can work with PDFs using the Pull from PDF file and Pull from inbound email steps. If you’re using the Pull from PDF file step, you can set up an auto-forwarding rule to have emails automatically flow into Parabola.
  2. Auto-detected Tables: After scanning the documents with AI, Parabola will auto-detect Tables that exist in the document. Simply select which Table you’d like to extract, and Parabola will automatically show you all of the associated columns.
  3. Keys: Beyond columns that exist in Tables, you often need to pull document-level data points from PDFs as well. Things like IDs and dates which are often found at the top of the page. Simply type in the additional values you need to extract in the Keys section of the settings. 

Pro tip

  • Whenever you’re working with LLMs, the output will be more effective when you add additional context and examples — and Parabola is no exception. To improve output results, provide example values and additional context for those hard-to-parse values. 

Visit our support docs to learn more about working with PDFs in Parabola. 

Next lesson

Extracting Data from PDFs in Parabola

While it's great when vendors send clean, structured CSV or Excel files, many operators frequently receive data in PDFs, which can be messy and difficult to process. Parabola’s Pull from PDF and Pull from Email Attachment steps make it easy to extract, clean, and transform important information from PDFs.

What Makes Parabola’s PDF Extraction Unique?

Handles variations in formatting across different documents
Uses AI-powered parsing with computer vision and large language models
Quick setup—simply type in what you want to extract
Seamlessly integrates with other Parabola steps for mapping, reconciliation, and reporting

Example: Extracting Invoice Data

Let’s say we receive an invoice PDF from Parabola Logistics and need to extract:

  • Line items (charge descriptions and amounts)
  • Invoice metadata (invoice number, date, and due date)

Step 1: Auto-Detect Tables

  1. Open the Pull from PDF step.
  2. Parabola’s AI automatically detects tables in the document.
  3. If the wrong table is selected, manually pick the correct one (e.g., “Charge Summary” instead of “Shipment Details”).
  4. Rename column headers if needed.

Step 2: Extract Key Document Information

  • Keys refer to document-level details (e.g., invoice number, dates).
  • Add Invoice Date, Due Date, and Invoice Number to extract these fields.
  • Use fine-tuning (optional):
    • Example: Specify that the Invoice Number follows a format like S0000.
    • Add instructions (e.g., "Found in top-left corner of the document") for better accuracy.

Step 3: Extract & Review the Data

  • Click Show Updated Results.
  • Parabola automatically extracts line items, charge amounts, invoice dates, and invoice numbers into a clean table.

Next Steps: Transforming Extracted Data

Now that we’ve covered different ways to pull data into Parabola, we’ll shift focus to transforming the extracted data using Parabola’s powerful transformation steps.

Try It Yourself!

Test the Pull from PDF step in the building challenge below, and let us know if you have any questions! 🚀