Question

Python 3.1.2 (r312:79147, Nov  9 2010, 09:41:54)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6]
Traceback (most recent call last):
  File "", line 1, in 
  File "/usr/local/lib/python3.1/codecs.py", line 300, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 2230: unexpected code byte

然而......

Python 2.4.3 (#1, Sep  8 2010, 11:37:47)
[GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6]
'2010-06-14 21:14:43 613 xxx.xxx.xxx.xxx 200 TCP_NC_MISS 4198 635 GET http www.thelegendssportscomplex.com 80 /thumbnails/t/sponsors/145x138/007.gif - - - DIRECT www.thelegendssportscomplex.com image/gif http://www.thelegendssportscomplex.com/ "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; InfoPath.1; MS-RTC LM 8)" OBSERVED "Sports/Recreation" - xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx\r\n'

有没有人知道为什么.readlines（）[6]不适用于python-3但在2.4中有用吗？

也......我以为0xAE是®

Answer 1

来自Python wiki：

UnicodeDecodeError通常在从特定编码解码str字符串时发生。由于编码只将有限数量的str字符串映射到unicode字符，因此str字符的非法序列将导致编码特定的decode（）失败

看起来你的编码与你想象的不同。

Answer 2

open功能doc：

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

永远使用编码读取文件：

open("/home/madsc13ntist/test_file.txt", "r",encoding='iso8859-1').readlines()[6]

忽略解码错误？设置错误='忽略'。 'errors'的默认值为'None'，与'strict'相同。

python3：readlines（）索引问题？

2 个答案: