Question

如何使用Python解析在线PDF文件？

我只需要第一页的第二行。我需要执行此操作而无需下载文件，而我使用的是Python 3.5

我已经尝试过类似的操作，但没有成功：Using PDFMiner (Python) with online pdf files. Encode the url?

from pdfminer.pdfparser import PDFParser
import urllib.request
from io import StringIO
import io

url = 'url_with_the_pdf'

open = urllib.request.urlopen(url).read()

memoryFile = io.StringIO(open)

parser = PDFParser(memoryFile)

我收到此错误：

memoryFile = io.StringIO(open) TypeError: initial_value must be str or None, 
not bytes

Answer 1

在Python 3中，使用io.BytesIO，即

memoryFile = io.BytesIO(open)

详细信息：https://docs.python.org/3.0/whatsnew/3.0.html#text-vs-data-instead-of-unicode-vs-8-bit

...导入io模块，并分别使用io.StringIO或io.BytesIO分别输入文本和数据

使用Python和PDFMiner解析在线PDF文件

1 个答案: