Pulling from PDFs
In this lesson, we tackle one of the more challenging aspects of data automation — turning unstructured PDF data into nice, clean data tables.
Looking for a sample PDF? Click here.
About Parabola’s PDF steps
Parabola isn’t just another PDF parser. Parabola’s approach to PDF parsing is unique for a few reasons:
- The tool was built to handle variations that exist between document formats.
- Our Pull from PDF file and Pull from inbound email steps pull data from documents using a blend of PDF scraping and a computer vision-enabled large language model (LLM).
- Parsing can be easily set up by simply typing in what you want to extract, and it doesn’t need the documents to be in the exact same format every time.
- Since it’s connected to the rest of Parabola, you can do additional data transformations beyond just pulling data off of a page.
How to use the step
As you get started working with PDFs in Parabola, here are some core concepts to understand:
- Pull PDFs from emails or static files: You can work with PDFs using the Pull from PDF file and Pull from inbound email steps. If you’re using the Pull from PDF file step, you can set up an auto-forwarding rule to have emails automatically flow into Parabola.
- Auto-detected Tables: After scanning the documents with AI, Parabola will auto-detect Tables that exist in the document. Simply select which Table you’d like to extract, and Parabola will automatically show you all of the associated columns.
- Keys: Beyond columns that exist in Tables, you often need to pull document-level data points from PDFs as well. Things like IDs and dates which are often found at the top of the page. Simply type in the additional values you need to extract in the Keys section of the settings.
Pro tip
- Whenever you’re working with LLMs, the output will be more effective when you add additional context and examples — and Parabola is no exception. To improve output results, provide example values and additional context for those hard-to-parse values.
Visit our support docs to learn more about working with PDFs in Parabola.