应用错误收集

我正在尝试从https://www.cia.gov/library/publications/world-leaders-1/pdfs/2013/September2013ChiefsDirectory.pdf中提取数据。我需要单独的标题和名称。

我尝试使用tabula-py软件包提取这些作为参数。请让我知道是否还有其他可用的软件包。我的约束是我需要使用python，而不应该使用ocr。

import tabula
from tabula import read_pdf
df = read_pdf('./September2013ChiefsDirectory.pdf',pages='all',guess=True,stream=True,pandas_options={'header':None})

从pdf

0 个答案: