如何从python中的网站解码提取的pfd文件?

时间:2017-02-19 21:51:44

标签: python pdf

这是代码

#!/usr/bin/python
import codecs
import urllib.request
resp = urllib.request.urlretrieve('http://normanpd.normanok.gov/filebrowser_download/657/2017-02-16%20Daily%20Incident%20Summary.pdf', 'test.pdf')
with codecs.open("test.pdf") as f:
     for line in f:
         line.decode('utf-8')

         print(line)

执行上述代码后,我收到如下错误

Traceback (most recent call last):
  File "normanpd.py", line 6, in <module>
    for line in f:
  File "/usr/lib/python3.5/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xb5 in position 11: invalid start byte

请帮助我解决此问题。

1 个答案:

答案 0 :(得分:0)

是什么让您认为该文件是编码字符串?它根本不是一个字符串; pdf不可读,它是二进制格式。你不能只是迭代并打印出来。