Question

我正在尝试创建一个数组，该数组具有从数据文件导入的5列。其中的4个是浮点数，最后一个是字符串。

数据文件如下：

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3.0,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5.0,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
4.6,3.4,1.4,0.3,Iris-setosa
5.0,3.4,1.5,0.2,Iris-setosa

我尝试了这些：

data = np.genfromtxt(filename, dtype = "float,float,float,float,str", delimiter = ",")

data = np.loadtxt(filename, dtype = "float,float,float,float,str", delimiter = ",")

，但是两个代码仅导入第一列。

为什么？我该如何解决？

请耐心等待！：）

Answer 1

您必须正确指定str类型："U20"，例如，最多20个字符：

data = np.loadtxt('data.txt', dtype = "float,"*4 + "U20", delimiter = ",")

似乎可以工作：

array([( 5.1,  3.5,  1.4,  0.2, 'Iris-setosa'),
       ( 4.9,  3. ,  1.4,  0.2, 'Iris-setosa'),
       ( 4.7,  3.2,  1.3,  0.2, 'Iris-setosa'),
       ( 4.6,  3.1,  1.5,  0.2, 'Iris-setosa'),
       ( 5. ,  3.6,  1.4,  0.2, 'Iris-setosa'),
       ( 5.4,  3.9,  1.7,  0.4, 'Iris-setosa'),
       ( 4.6,  3.4,  1.4,  0.3, 'Iris-setosa'),
       ( 5. ,  3.4,  1.5,  0.2, 'Iris-setosa')],
      dtype=[('f0', '<f8'), ('f1', '<f8'), ('f2', '<f8'), ('f3', '<f8'), ('f4', '<U20')])

使用pandas的另一种方法可以为您提供对象数组，但这会减慢进一步的计算速度：

In [336]: pd.read_csv('data.txt',header=None).values
Out[336]: 
array([[5.1, 3.5, 1.4, 0.2, 'Iris-setosa'],
       [4.9, 3.0, 1.4, 0.2, 'Iris-setosa'],
       [4.7, 3.2, 1.3, 0.2, 'Iris-setosa'],
       [4.6, 3.1, 1.5, 0.2, 'Iris-setosa'],
       [5.0, 3.6, 1.4, 0.2, 'Iris-setosa'],
       [5.4, 3.9, 1.7, 0.4, 'Iris-setosa'],
       [4.6, 3.4, 1.4, 0.3, 'Iris-setosa'],
       [5.0, 3.4, 1.5, 0.2, 'Iris-setosa']], dtype=object)

如何从数据文件导入numpy结构化数组

1 个答案: