Question

我想知道是否有人可以帮助我，我已事先尝试过搜索，但我无法找到答案：

我有一个名为info.dat的文件，其中包含：

#
# *** Please be aware that the revision numbers on the control lines may not always
# *** be 1 more than the last file you received. There may have been additional
# *** increments in between.
#
$001,427,2018,04,26
#
# Save this file as info.dat
#

我试图循环文件，获取版本号并将其写入自己的文件

with open('info.dat', 'r') as file:
    for line in file:
        if line.startswith('$001,'):
            with open('version.txt', 'w') as w:
                version = line[5:8] # Should be 427
                w.write(version + '\n')
                w.close()

虽然这确实写了正确的信息，但我不断收到以下错误：

Traceback (most recent call last):
File "~/Desktop/backup/test.py", line 4, in <module>
for line in file:
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/encodings/ascii.py", line 26, in decode
return codecs.ascii_decode(input, self.errors)[0]
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe4 in position 6281: ordinal not in range(128)

尝试添加以下内容时

with open('info.dat', 'r') as file:
    for line in file:
        if line.startswith('$001,'):
            with open('version.txt', 'w') as w:
                version = line[5:8]
                # w.write(version.encode('utf-8') + '\n')
                w.write(version.decode() + '\n')
                w.close()

我收到以下错误

Traceback (most recent call last):
File "~/Desktop/backup/test.py", line 9, in <module>
w.write(version.encode('utf-8') + '\n')
TypeError: can't concat str to bytes

Answer 1

您正在尝试打开一个文本文件，该文件将使用您的默认编码隐式解码每一行，然后使用UTF-8手动重新编码每一行，然后将其写入文本文件，这将隐式解码该UTF -8再次使用默认编码。那不行。但好消息是，正确的要做的事情要简单得多。

如果你知道输入文件是UTF-8（它可能不是 - 见下文），只需将文件打开为UTF-8而不是默认编码：

with open('info.dat', 'r', encoding='utf-8') as file:
    for line in file:
        if line.startswith('$001,'):
            with open('version.txt', 'w', encoding='utf-8') as w:
                version = line[5:8] # Should be 427
                w.write(version + '\n')
                w.close()

事实上，我非常确定您的文件在UTF-8中不是，但是Latin-1（在Latin-1中，\xa3是ä;在UTF-8，它是一个3字节序列的开始，可能编码一个CJK字符）。如果是这样，你可以使用正确的编码而不是错误的编码做同样的事情，现在它将起作用。

但如果您不知道编码是什么，请不要试图猜测;只是坚持二进制模式。这意味着通过rb和wb模式而不是r和w，并使用bytes文字：

with open('info.dat', 'rb') as file:
    for line in file:
        if line.startswith(b'$001,'):
            with open('version.txt', 'wb') as w:
                version = line[5:8] # Should be 427
                w.write(version + b'\n')
                w.close()

无论哪种方式，都无需在任何地方拨打encode或decode;让文件对象为您处理它，并且只处理一种类型（无论str还是bytes）。

Answer 2

encode（）返回字节，但'\ n'是字符串，你需要将字符串中的字符串转换为字节+字节，所以试试这个

w.write(version.encode('utf-8') + b'\n')

UnicodeDecodeError：＆＃39; ascii＆＃39;编解码器不能解码字节0xe4

2 个答案: