PDFQuery - 'ascii'编解码器不能编码字符u'\ u2013'

时间:2017-10-18 13:22:56

标签: python pdf pdfminer

我正在使用PDFQuery从PDF中提取数据。它适用于大多数PDF。

最近,对于少数PDF,我在几页上遇到以下错误:

'ascii' codec can't encode character u'\u2019' in position 91: ordinal not in range(128)

'ascii' codec can't encode character u'\u2013' in position 29: ordinal not in range(128)

我的代码如下所示:

pdf = pdfquery.PDFQuery(pdf_file)
pages_in_pdf = pdf.doc.catalog['Pages'].resolve()['Count']
for i in range(0, pages_in_pdf):

try:
    pdf.load(i)
    # logic
except ValueError as e:
    print('Error on page number {0}. Error message is {1}'.format(i, e))

0 个答案:

没有答案