2016-01-22 03:58:30 +00:00
|
|
|
# tabula
|
|
|
|
|
|
|
|
> Extract tables from PDF files.
|
2019-05-14 17:28:36 +01:00
|
|
|
> Homepage: <https://tabula.technology>.
|
2016-01-22 03:58:30 +00:00
|
|
|
|
|
|
|
- Extract all tables from a PDF to a CSV file:
|
|
|
|
|
2016-01-22 04:01:44 +00:00
|
|
|
`tabula -o {{file.csv}} {{file.pdf}}`
|
|
|
|
|
|
|
|
- Extract all tables from a PDF to a JSON file:
|
|
|
|
|
|
|
|
`tabula --format JSON -o {{file.json}} {{file.pdf}}`
|
2016-01-22 03:58:30 +00:00
|
|
|
|
|
|
|
- Extract tables from pages 1, 2, 3, and 6 of a PDF:
|
|
|
|
|
|
|
|
`tabula --pages {{1-3,6}} {{file.pdf}}`
|
|
|
|
|
|
|
|
- Extract tables from page 1 of a PDF, guessing which portion of the page to examine:
|
|
|
|
|
2016-01-22 04:01:44 +00:00
|
|
|
`tabula --guess --pages {{1}} {{file.pdf}}`
|
2016-01-22 03:58:30 +00:00
|
|
|
|
|
|
|
- Extract all tables from a PDF, using ruling lines to determine cell boundaries:
|
|
|
|
|
2016-01-22 04:01:44 +00:00
|
|
|
`tabula --spreadsheet {{file.pdf}}`
|
2016-01-22 03:58:30 +00:00
|
|
|
|
|
|
|
- Extract all tables from a PDF, using blank space to determine cell boundaries:
|
|
|
|
|
2016-01-22 04:01:44 +00:00
|
|
|
`tabula --no-spreadsheet {{file.pdf}}`
|