Python 3.1.2 (r312:79147, Nov 9 2010, 09:41:54) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6] Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python3.1/codecs.py", line 300, in decode (result, consumed) = self._buffer_decode(data, self.errors, final) UnicodeDecodeError: 'utf8' codec can't decode byte 0xae in position 2230: unexpected code byte
然而......
Python 2.4.3 (#1, Sep 8 2010, 11:37:47) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> open("/home/madsc13ntist/test_file.txt", "r").readlines()[6] '2010-06-14 21:14:43 613 xxx.xxx.xxx.xxx 200 TCP_NC_MISS 4198 635 GET http www.thelegendssportscomplex.com 80 /thumbnails/t/sponsors/145x138/007.gif - - - DIRECT www.thelegendssportscomplex.com image/gif http://www.thelegendssportscomplex.com/ "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 2.0.50727; InfoPath.1; MS-RTC LM 8)" OBSERVED "Sports/Recreation" - xxx.xxx.xxx.xxx xxx.xxx.xxx.xxx\r\n'
有没有人知道为什么.readlines()[6]不适用于python-3但在2.4中有用吗?
也......我以为0xAE是®
答案 0 :(得分:0)
来自Python wiki:
UnicodeDecodeError通常在从特定编码解码str字符串时发生。由于编码只将有限数量的str字符串映射到unicode字符,因此str字符的非法序列将导致编码特定的decode()失败
看起来你的编码与你想象的不同。
答案 1 :(得分:0)
open功能doc:
open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)
永远使用编码读取文件:
open("/home/madsc13ntist/test_file.txt", "r",encoding='iso8859-1').readlines()[6]
忽略解码错误?设置错误='忽略'。 'errors'的默认值为'None',与'strict'相同。