Question

我试图确保一些相对简单的Python 2模块与Python 3兼容。我目前有一个数据文件，出于MWE的目的，看起来像

n
0

以下代码片段适用于Python 2.7，这基本上是一种解决方法，可以在Python 2.7和3.5中使用genfromtxt获得names=True等行为并使用相同的代码。

import numpy as np
with open('bad_int.data', 'rb') as f: lines = f.readlines()
data = np.loadtxt(lines[1:2], dtype=[('n', int)])

使用Python 3.5，我收到错误

Traceback (most recent call last):
  File "bad_int3.py", line 5, in <module>
    data = np.loadtxt(lines[1:2], dtype=[('n',int)])
  File "/usr/lib64/python3.5/site-packages/numpy/lib/npyio.py", line 938, in loadtxt
    X = np.array(X, dtype)
ValueError: invalid literal for int() with base 10: "b'0'"

我知道还有其他方法可以加载这样的文件，但我现在必须按行切片，因为它有多个数组。我试图找出领先的b意味着什么（二进制？），但还没有运气。那么如何在不收到此错误的情况下在Python 2.7和3.5中读取此类数据呢？

修改

我刚刚注意到，如果有多个字段，一切正常。因此，例如，如果数据更改为

n m
0 0

和

的最后一行

data = np.loadtxt(lines[1:2], dtype=[('n', int), ('m', int)])

然后一切都在Python 2.7和3.5中完美运行。

Answer 1

在PY3中，您需要以二进制模式打开文件：

with open('data', 'rb') as f: 
     lines = f.readlines()
    data = np.loadtxt(lines[1:2], dtype=[('n',int)])

loadtxt（和genfromtxt）使用字节串操作。因此，如果他们自己打开文件，则使用rb。

您也可以尝试：

data = np.loadtxt('data', skiprows=1, dtype=[('n',int)])

前导b表示字节字符串。 Py3字符串默认为unicode。

In [99]: txt=b"""n
    ...: 0
    ...: 1"""
In [100]: np.loadtxt(txt.splitlines()[1:], dtype=int)
Out[100]: array([0, 1])

但是你的dtype

In [101]: dt=np.dtype([('n',int)])
In [102]: np.loadtxt(txt.splitlines()[1:], dtype=dt)
...
ValueError: invalid literal for int() with base 10: "b'0'"

但这有效：

In [103]: np.genfromtxt(txt.splitlines()[1:], dtype=dt)
Out[103]: 
array([(0,), (1,)], 
      dtype=[('n', '<i4')])

或者让genfromtxt创建dtype：

In [105]: np.genfromtxt(txt.splitlines(), dtype=None, names=True)
Out[105]: 
array([(0,), (1,)], 
      dtype=[('n', '<i4')])

因此loadtxt如何处理带来问题的dtype。我以前没见过这个。但后来我没有看到很多只加载一列的情况。

numpy.loadtxt无法在Python 3中读取int

1 个答案: