尝试创建一个读取行并发布它们的twitter机器人。使用Python3和tweepy,通过我的共享服务器空间上的virtualenv。这是代码中似乎有问题的部分:
#!/foo/env/bin/python3
import re
import tweepy, time, sys
argfile = str(sys.argv[1])
filename=open(argfile, 'r')
f=filename.readlines()
filename.close()
这是我得到的错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xfe in position 0: ordinal not in range(128)
该错误特别指向f=filename.readlines()
作为错误的来源。知道什么可能是错的吗?感谢。
答案 0 :(得分:20)
我认为最好的答案(在Python 3中)是使用errors=
参数:
with open('evil_unicode.txt', 'r', errors='replace') as f:
lines = f.readlines()
证明:
>>> s = b'\xe5abc\nline2\nline3'
>>> with open('evil_unicode.txt','wb') as f:
... f.write(s)
...
16
>>> with open('evil_unicode.txt', 'r') as f:
... lines = f.readlines()
...
Traceback (most recent call last):
File "<stdin>", line 2, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/codecs.py", line 319, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe5 in position 0: invalid continuation byte
>>> with open('evil_unicode.txt', 'r', errors='replace') as f:
... lines = f.readlines()
...
>>> lines
['�abc\n', 'line2\n', 'line3']
>>>
请注意,errors=
可以是replace
或ignore
。这是ignore
的样子:
>>> with open('evil_unicode.txt', 'r', errors='ignore') as f:
... lines = f.readlines()
...
>>> lines
['abc\n', 'line2\n', 'line3']
答案 1 :(得分:9)
您的默认编码似乎是ASCII,其输入很可能是UTF-8。当您在输入中点击非ASCII字节时,它会抛出异常。 readlines
本身不是问题的原因;相反,它导致读取+解码发生,并且解码失败。
虽然这很容易解决; Python 3中的默认open
允许您提供已知的encoding
输入,用任何其他可识别的编码替换默认值(在您的情况下为ASCII)。提供它允许您继续阅读str
(而不是显着不同的原始二进制数据bytes
对象),同时让Python完成从原始磁盘字节转换为真实文本数据的工作:
# Using with statement closes the file for us without needing to remember to close
# explicitly, and closes even when exceptions occur
with open(argfile, encoding='utf-8') as inf:
f = inf.readlines()
答案 2 :(得分:0)