Python 3.6:从pdf表中提取数据时,数据是混乱的

时间:2019-04-06 21:46:20

标签: python-3.x dataframe pdf tabula

我正在尝试使用Tabula从PDF表中提取数据框。我收到的所有数据杂乱无章,无法订购。谁能指出我的语法不正确吗?

表的图像和我的Python会话的输出:

enter image description here
(点击图片放大)

output of my Python session

代码:

import tabulate as tabulate  
import tabula
from tabula import read_pdf
import pandas as pd
import camelot
a = read_pdf(r"C:\Users\Emege\Downloads\cencosud.pdf", pages = 6, guess = False,\
        encoding = "ISO-8859-1" ,output_format = "csv")

print(a)
a.to_csv("cen.csv", encoding = "utf-8")

b = camelot.read_pdf(r"C:\Users\Emege\Downloads\cencosud.pdf")
print(b)

0 个答案:

没有答案