Question

我试图从python脚本中读取一个日志文件。我的程序在Linux中工作正常，但我在windows中遇到错误。在特定行号读取一行后，我收到了错误

  File "C:\Python\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x8f in position 311: char
acter maps to <undefined>

以下是我用来读取文件的代码

with open(log_file, 'r') as log_file_fh:
    for line in log_file_fh:
        print(line)

我试图通过使用不同的编码模式来修复它，如ascii，utf8，utf-8，ISO-8859-1，cp1252，cp850。但仍面临同样的问题。有没有办法解决这个问题。

Answer 1

我想通过python脚本阅读的日志文件是用西方语言编写的。我已经审查了以下链接 https://docs.python.org/2.4/lib/standard-encodings.html 我使用'cp850'作为编码模式，这对我有用

with open(log_file, 'r',encoding='cp850') as log_file_fh:
    for line in log_file_fh:
        print(line)

但是对于西欧来说，该网站上有很多编解码器。我认为这不是正确的解决方案，因为大多数开发人员建议不要使用'cp850'模式

处理编码错误的最好方法是在打开文件时添加errors参数并将'ignore'作为属性。它将忽略我们无法解码的特殊字符。在我的情况下，这个选项没问题，因为我不喜欢我想读取整个文件内容。我只想要一些特定的日志。

with open(log_file, 'r',errors='ignore') as log_file_fh:
    for line in log_file_fh:
        print(line)

Answer 2

编辑：按照建议的二进制模式打开文件：with open(log_file, 'rb')

然后在你的代码中解码utf-8：

with open(log_file, 'r') as log_file_fh:
    for line in log_file_fh:
        line = line.decode('utf-8')
        print(line)

UnicodeDecodeError：＆＃39; charmap＆＃39;编解码器无法解码位置XXX的字节0x8f：char

2 个答案: