我有一些PDF,这些PDF分为一些需要抓取的列。问题在于每个列都是多页的,并且不在列的典型布局中,例如:
******Column 1******************Column 2*************
Sombody once told me Finger and her thumb
The world was gonna In the shape of an "L"
Roll me. I ain't the On her forehead. Well
*******************NEXT PAGE**************************
Sharpest tool in the The years start coming
Shed. She was looking And they don't stop coming
Kind of dumb with her
我已经尝试使用标准PDF抓取工具(例如PDFMiner),但它只会返回一个类似以下内容的字符串:
有人曾经告诉我
世界将会
滚我我不是
手指和拇指
任何帮助将不胜感激!