使用Python从PDF多页列中提取字符串数据

时间:2018-09-02 05:04:40

标签: python pdf pdfminer

我有一些PDF,这些PDF分为一些需要抓取的列。问题在于每个列都是多页的,并且不在列的典型布局中,例如:

******Column 1******************Column 2*************

Sombody once told me Finger and her thumb The world was gonna In the shape of an "L" Roll me. I ain't the On her forehead. Well *******************NEXT PAGE************************** Sharpest tool in the The years start coming Shed. She was looking And they don't stop coming Kind of dumb with her

我已经尝试使用标准PDF抓取工具(例如PDFMiner),但它只会返回一个类似以下内容的字符串:

有人曾经告诉我
 世界将会
 滚我我不是
 手指和拇指

任何帮助将不胜感激!

0 个答案:

没有答案