Question

我已经在文本文件中保存了这种格式的unicode字符串 b'\ x1e \ x80E \ xd7 \ xd4M \ x94 \ xa8 \ xb4 \ xf3bl [^' 但是当我从这个外部文本文件中读取它时，它会以普通字符串的形式读取。

我尝试读取二进制格式的文件，例如打开（celesi_file_path，“ rb”）

fciphertext = open(ciphertext_file_path, "rb")
fkey = open(celesi_file_path,"rb")
celesi = fkey.read()
ciphertext = fciphertext.read()
ciphertext = ciphertext.decode('latin-1')
celesi = celesi.decode('latin-1')
print(type(celesi))
print(type(ciphertext))
print(celesi)
print(ciphertext)

输出为以下字符串： “ b'\ x1e \ x80E \ xd7 \ xd4M \ x94 \ xa8 \ xb4 \ xf3bl [^'” 而我希望它是不是这种格式的字符串

Answer 1

看看这个：

>>> data = b'\xd0\x9f\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'
>>> str(data)
"b'\\xd0\\x9f\\xd1\\x80\\xd0\\xb8\\xd0\\xb2\\xd0\\xb5\\xd1\\x82'"

因此，如果您向文件中写入了str(data)，则实际上是写了斜杠和x。您没有写 bytes ，而是写了Python提供的这些字节的字符串 representation 。在此示例中，您写的是 51个字节（！），而不是原始的12个字节。

您应该自己编写字节：

with open("data.bin", "wb") as f:
    f.write(data)

然后也以二进制模式打开此文件并读取字节。

unicode在Python中不被解释为unicode

1 个答案: