Question

我通过ADB在Android中生成了一个bug报告，并提取了大型报告文件。但是当我打开并阅读该文件时，它会打印出来：

>>> f = open('bugreport.txt')
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc0 in position 12788794: invalid start byte

>>> f = open('bugreport.txt', encoding='ascii')
>>> f.read()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.6/encodings/ascii.py", line 26, in decode
    return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xef in position 5455694: ordinal not in range(128)

似乎UTF-8和ASCII编解码器都无法对文件进行解码然后我通过两个命令检查了文件编码：

$ enca bugreport.txt
7bit ASCII characters
$ file -i bugreport.txt
bugreport.txt: text/plain; charset=us-ascii

他们告诉我文件是用ascii编码的，而我不能用ascii编解码器打开它。
其他一些线索：
1.上面的python解释器是python 3.6.3。 我尝试了python 2.7.14并且进展顺利。
2.如果通过添加参数打开文件，则错误=＆＃39;忽略＆＃39;和encoding =＆＃39; ascii＆＃39;，可以读取，但所有中文字符都丢失了。

那么如何在python 3中打开那个奇特的文件呢？任何人都可以帮助我吗？

Answer 1

在python 3中，您可以使用开放上下文指定编码。

with open(file, encoding='utf-8') as f:
    data = f.read()

Answer 2

该文件可能被编码为latin-1或utf-16（小端）。

>>> bytes_ = [b'\xc0', b'\xef']
>>> for b in bytes_:
...     print(repr(b), b.decode('latin-1'))
... 
b'\xc0' À
b'\xef' ï
>>> bytes_ = [b'\xc0\x00', b'\xef\x00']
>>> for b in bytes_:
...     print(repr(b), b.decode('utf-16le'))
... 
b'\xc0\x00' À
b'\xef\x00' ï

Python：无法读取以ASCII

2 个答案: