Question

我对二进制东西很陌生，而且我在挣扎一点。我正在尝试将二进制文件转换为文本。到目前为止，这是我的代码：

 with open(file_path, 'rb') as f:
  data = f.read()
  temp_data = str(data)

  if temp_data[-1] == '\\':
    temp_data = temp_data[:-1]

  temp_data = bytes(temp_data, 'utf-8')
  text = temp_data.decode('utf-8')

似乎在起作用...部分。我在长字节字符串中看到了一些想要查看的内容，例如文件名和时间戳。但是我还是看到很多字节值。文本变量的值为：

 b'\x00\x00\x00\x00T\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x004\x01\x00\x00\x03\x00\x00\x00\x00\x00\x00\x00X\x01\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00x\x01\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x02\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00TCODEF1001.DAR_MeasLog.2019-03-05+01:10:45.2019-03-05+01:11:21.1.100.0\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x95\xcc}\\\xba\xcc}\\LOG\x00\x00\x00\x00\x00\x00\x00\x00\x00OKL\x00\x04\x00\x00\x00\x01\x00\x00\x00VKL\x00\x05\x00\x00\x00\x01\x00\x00\x00YKL\x00\x06\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00h\xcc}\\\x00\x00\x00\x00\x01\x00\x00\x00\x01\x00\x00\x00\xa4\xcc}\\\x02\x00\x00\x00\x02\x00\x00\x00\x01\x00\x00\x00M\x00\x00\x00\x95\xcc}\\\xb9\xcc}\\'

我不知道如何解决这个问题，或者这意味着什么。

注意：我需要解析最后一个字符'\'的字符串，因为解码给我一个错误“无法解码，因为最后一个字符是'\'”，或者类似的内容。

谢谢！

编辑：我更改了代码，所以现在看起来像这样：

 with open(file_path, 'rb') as f:
  data = f.read()

  readable_str = data.decode('utf-16')
  bytes_again = readable_str.encode('utf-16')

当我打印read_str时，我得到了完全不应该发生的非ASCII值。我收到这样的文字：

TĴŘŸ䍔䑏䙅〱㄰䐮剁䵟慥䱳杯㈮㄰ⴹ㌰〭⬵㄰ㄺ㨰㔴㈮㄰ⴹ㌰〭⬵㄰ㄺ㨱ㄲㄮㄮ〰〮첕屽첺屽佌G䭏L䭖L䭙L챨屽첤屽M첕屽첹屽

解码不适用于'utf-8'或'utf-32'。有没有办法告诉基于此使用什么解码？还有其他我没有尝试过的编码吗？谢谢！

Answer 1

Python3中用于读取和写入数据的方法比以前更加明确。几乎始终假定字节，在处理脚本中的数据之前先解码，然后在写出之前先编码回字节。
我强烈建议您观看nedbat的talk关于Python的unicode以及如何正确使用字节输入/输出的信息。

无论如何，你要做的是

with open('file.txt', 'rb') as fo:
    data = fo.read()  # This is in bytes

# We "decipher" the bytes  into something we can work with
readable_str = data.decode('utf-8')  

bytes_again = readable_str.encode('utf-8')
with open('other_file.txt', 'wb') as fw:
    fw.write(bytes_again)

Python：字节未正确转换？

1 个答案: