View All Docs
Product Overview
A down-facing caret indicating that this drawer is closed. Click to open it.
Account Overview
Integrations
A down-facing caret indicating that this drawer is closed. Click to open it.
Transforms
A down-facing caret indicating that this drawer is closed. Click to open it.
Security
A down-facing caret indicating that this drawer is closed. Click to open it.
Integrations   ->

Use PDF file

Parabola’s PDF parsing enables you to pull in data from a PDF file in two steps:

  1. Pull from PDF file step
  2. Email attachment step

Upload PDF file

Use the Pull from PDF file step to work with a single PDF file. Upload a file by either dragging one into the outlined box, or select "Click to upload a file."

Pull from PDF file step

At launch, you can use this step at no extra charge to your team. If usage exceeds a reasonable threshold, we may contact you to upgrade to an automation package.

Related Recipes

Integrations   ->

Email PDF file attachment

The Email attachment step can pull in data from attached PDF files. The way that the PDF is parsed can be adjusted with the accompanying settings.

Pull PDF file from email attachment step

Related Recipes

Integrations   ->

Pulling data from PDF files

Use AI to parse PDF files into tables of data.

Parsing settings

Once a PDF is selected, choose how to initially process the file:

  • Use an auto-detected table (default)
  • Define a custom table
  • Extract all data (not recommended)

Using the default auto-detection setting will send the file through our PDF parsing pipeline, where we will identify tables within the document, name them, and select the first table to extract possible columns from. (If the AI cannot find the exact table that you need, use the option to define a custom table.)

Once this step finishes its first calculation, you should see a table selected with a set of columns. The keys section should be empty.

Add additional columns by clicking the button to add a column, and then defining the name of the column. Helpful tips:

  • Column names can be descriptive or instructive, and do not need to match exactly what the PDF says. However, the name should be something that is easy for the AI to associate with the desired column of data
  • Providing examples is the best way to increase the accuracy of column (or key) parsing
  • Each column can have additional instructions added to it, describing how to find the column, or how to manipulate it

Add keys to your results by clicking the button to add a key, and then defining the name of that key.

  • Keys are things that exist once in the document, or in association with a table, such as an invoice number, billing date, shipping address, or grand total amount
  • Keys also benefit from examples, as well as any additional instructions that can help the AI find the exact value that you need

Usage tips

  • This step can take many minutes to run! Grab a coffee and relax while the AI does the work for you. The more document pages that are needed for parsing, the longer it may take. Pages not associated with any values/tables are not sent for parsing.
  • In the Advanced Settings of the step, you can choose to enable your step to accept automatic updates, or ignore them. If you accept updates, the output of these steps may change as we release updates to the AI prompting.
  • If you need to pull data across multiple tables (from a single file), you will likely need multiple steps – one per table.

Columns and keys

It can be unclear whether a particular piece of data would be viewed as a column or a key.

  • In general, keys are single pieces of data that are applicable to the entire document
  • Columns are parts of tables that are likely to have more than one row associated with them
  • “Total” rows are best expressed as keys

Columns that are shown after auto-detection do not always represent every column possible. Add more columns to match what you want the output to look like

It is very helpful to rename columns and keys to indicate to the AI what data you need

  • The name can be very specific and used to find the exact data, or even manipulate data!
  • For example, if an “Item” column contains both an ID and a description (e.g. “Red T-shirt #1494827), you can create two columns, named “Item description”and “Item ID”. The AI will then attempt to split the “Item” column into two columns, based on those names.

Giving examples for columns and keys is one of the best ways to increase accuracy

  • The only exception is for multi-line data points, like an address. If that is the example, type it out in the fine tuning for that column instead

Mark columns as “Child columns” if they contain rows that have values unique from the parent columns:

Before

After marking “Size” as a child column

Extract all data

Choosing this option will use OCR to process your file, not AI.

You can choose the desired data format based on how you plan to transform the data in your Parabola flow. From the “Data format” dropdown, you have the following options:

  1. All data: this will return all of the PDF data, organized into rows
  2. Table data: this will return only data from identified tables within the PDF file. If your file has multiple tables, each will have a unique ID (which you can use to later filter results, for example), and results will be returned sequentially (e.g. table 1, then table 2, and so on). Note: tables that span multiple pages will be broken into individual tables for each page
  3. Key-Value pairs: this will return all identifiable key/value pairs – things that are clearly associated or labeled, such as “color: red” or “Customer name- Parabola”
  4. Raw text: this will return all of the PDF data, in a single cell (one cell per file page). This format is most useful if you plan to apply an AI step, like Extract or Categorize

For the “Table data” and “Key-Value pairs” formats, you can automatically pivot your results using the checkbox that appears in the step settings.

Limitations

  • File size: PDF files must be <500 MB and 3,000 pages
  • Languages supported: English, French, German, Italian, Portuguese, and Spanish
  • PDFs cannot be password protected
  • The maximum height and width is 40 inches and 2,880 points
  • The minimum height for text to be detected is 15 pixels (~8 point font)
  • We recommend always auditing the results returned in Parabola to ensure that they’re complete

Related Recipes