Question

我试图在Python 3.6中使用为Python 2.7编写的一段代码，而且我在管理字节字符串的处理方式上遇到了麻烦。代码用于读取在编写代码之前存在的.dat文件。运行未触及的P2.7脚本将返回以下错误：

import numpy as np

buff = ''
dt = np.dtype([('var1', np.uint32, 1), ('var2', np.uint8, 1)])

with open(filename, 'rb') as f:
    for line in f:
        dat = line
--->    buff += dat

    data = np.frombuffer(buffer=buff, dtype=dt)

TypeError: must be str, not bytes

如果我做对了，而Python2会将读取的字节连接到字符串buff而不抱怨，Python3会关注字节和字符串之间的区别。对str（line）的类型转换返回以下错误：

    for line in f:
        dat = str(line)
        buff += dat
->  data = np.frombuffer(buffer=buff, dtype=dt)

AttributeError: 'str' object has no attribute '__buffer__'

我应该怎么做？应该是什么类型的buff？任何适用于P2.7和P3.6的解决方案？

修改

事实证明，filename.dat中的数据根本不是由unicode字符串构成的。我已经编辑了这个问题，以便根据我的错误假设删除提及，并且我已经添加了一些代码，我试图展示一个我现在意识到的最小例子。对不起，感到困惑。

Answer 1

使用io.BytesIO作为缓冲区。这与Python 2和3兼容，并且对于大型数据集而言，最好使用str / bytes串联。

import io

import numpy as np


buff = io.BytesIO()
dt = np.dtype([('var1', np.uint32, 1), ('var2', np.uint8, 1)])

with open(filename, 'rb') as f:
    for line in f:
        buff.write(line)

    buff.seek(0)
    data = np.frombuffer(buffer=buff.read(), dtype=dt)

在Python3中解码字节字符串时出错[TypeError：必须是str，而不是字节]

1 个答案: