Pulling from PDFs

In this lesson, we tackle one of the more challenging aspects of data automation — turning unstructured PDF data into nice, clean data tables. 

Building challenge

  1. Download this PDF file
  2. Add a card to the canvas and name it “Pull from PDF invoice.” 
  3. Drag a Pull from PDF file step onto the canvas and upload the invoice PDF. 
  4. Click Show updated results (note: it may take a minute or two to load). 
  5. Pull the charges description Table as well as the invoice date, invoice due date, and invoice number keys. 

To check your work, take a look at this quick video.

About Parabola’s PDF steps  

Parabola isn’t just another PDF parser. Parabola’s approach to PDF parsing is unique for a few reasons:

  1. The tool was built to handle variations that exist between document formats. 
  2. Our Pull from PDF file and Pull from inbound email steps pull data from documents using a blend of PDF scraping and a computer vision-enabled large language model (LLM).
  3. Parsing can be easily set up by simply typing in what you want to extract, and it doesn’t need the documents to be in the exact same format every time.
  4. Since it’s connected to the rest of Parabola, you can do additional data transformations beyond just pulling data off of a page. 

How to use the step

As you get started working with PDFs in Parabola, here are some core concepts to understand: 

  1. Pull PDFs from emails or static files: You can work with PDFs using the Pull from PDF file and Pull from inbound email steps. If you’re using the Pull from PDF file step, you can set up an auto-forwarding rule to have emails automatically flow into Parabola.
  2. Auto-detected Tables: After scanning the documents with AI, Parabola will auto-detect Tables that exist in the document. Simply select which Table you’d like to extract, and Parabola will automatically show you all of the associated columns.
  3. Keys: Beyond columns that exist in Tables, you often need to pull document-level data points from PDFs as well. Things like IDs and dates which are often found at the top of the page. Simply type in the additional values you need to extract in the Keys section of the settings. 

Pro tip

  • Whenever you’re working with LLMs, the output will be more effective when you add additional context and examples — and Parabola is no exception. To improve output results, provide example values and additional context for those hard-to-parse values. 

Visit our support docs to learn more about working with PDFs in Parabola. 

Next lesson
SaaS
Freight & Logistics
Retail & Ecomm
Automatically manage your CRM
Automatically update store inventory
Better, faster, more powerful reports
Custom metrics, anywhere
Ecommerce Marketing Flows
Ecommerce Operations Flows
Generate rich customer insights
Go beyond the limitations of spreadsheets
Intelligent search and email marketing
Manage unique products
Powerful, custom reporting
Quickly respond to low inventory
RevOps Use Cases
Set up triggered alerts (even complicated ones)
Support your Sales team
Use APIs without any code