我正在尝试从以下基于网络的PDF中抓取文本信息:http://www.cmegroup.com/delivery_reports/IssuesAndStopsReport.pdf
关于如何做到这一点的任何建议?我没有太多运气(不识别路径)探索了tm包:
> pdf.loader <- readPDF(control= list(text = "-layout"))
> rr <- pdf.loader(elem=list(uri="http://www.cmegroup.com/delivery_reports/IssuesAndStopsReport.pdf"),language="en",id="id1")
Error: Cannot handle URI 'http://www.cmegroup.com/delivery_reports/IssuesAndStopsReport.pdf'.
Error: Cannot handle URI 'http://www.cmegroup.com/delivery_reports/IssuesAndStopsReport.pdf'.
Warning messages:
1: In normalizePath(file) :
path[1]="http://www.cmegroup.com/delivery_reports/IssuesAndStopsReport.pdf": No such file or directory
2: running command ''pdftotext' -layout 'http://www.cmegroup.com/delivery_reports/IssuesAndStopsReport.pdf' -' had status 1
我也试过在readPDF()
中输入不同的“引擎”参数,但没有运气。