我正在尝试将pdf数据转换为pdf_path
中的文本。我有我正在使用subprocess
的文件的路径,并且我需要相同的方法,因为我只是将代码从python2转换为python3。这段代码在python2上可以正常工作,但在python3上却不能,我需要相同的编码方式,但需要知道我正在使用的问题pdfminer
。
text = subprocess.check_output(["pdf2txt.py", pdf_path])
我面临的错误如下:
Traceback (most recent call last): File "/usr/local/bin/pdf2txt.py", line 115, in <module> if __name__ == '__main__': sys.exit(main(sys.argv)) File "/usr/local/bin/pdf2txt.py", line 108, in main caching=caching, check_extractable=True): File "/usr/local/lib/python3.6/dist-packages/pdfminer/pdfpage.py", line 122, in get_pages doc = PDFDocument(parser, password=password, caching=caching) File "/usr/local/lib/python3.6/dist-packages/pdfminer/pdfdocument.py", line 583, in __init__ raise PDFSyntaxError('No /Root object! - Is this really a PDF?') pdfminer.pdfparser.PDFSyntaxError: No /Root object! - Is this really a PDF? subprocess.CalledProcessError: Command '['pdf2txt.py', '/tmp/temp.pdf']' returned non-zero exit status 1.