table

DeepDive Table Examples

This example demonstrates how to extract tabular data from PDF documents. The lib/totable.jar file uses Computer Vision techniques to recognize tables and outputs XML markup. The DeepDive extractor parses the XML markup and populates a database relation with table data. The schema of the relation is as follows:

CREATE TABLE table_cells(
  id bigint,
  filename text,
  page int,
  table_id int,
  row int,
  column_start int,
  column_end int,
  content text,
  pdf_coordinates text
)

The TableCell in udf/table_cell.py can be reused in your own applications to deal with tabular data.

Running

Edit the db.url file according to your database configuration.
Run deepdive initdb, then deepdive run

Notes

This example application does not perform any inference. It's purpose is to show how to use extractors to extract tables from PDF data.

Name		Name	Last commit message	Last commit date
parent directory ..
input		input
lib		lib
totable_tmp/Citrix		totable_tmp/Citrix
udf		udf
.gitignore		.gitignore
README.md		README.md
db.url		db.url
deepdive.conf		deepdive.conf
schema.sql		schema.sql

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

DeepDive Table Examples

Running

Notes

FilesExpand file tree

table

Directory actions

More options

Directory actions

More options

Latest commit

History

table

Folders and files

parent directory

README.md

DeepDive Table Examples

Running

Notes