numpy数组到ndarray

时间:2016-11-17 16:22:33

标签: python python-2.7 pandas numpy

我有一个导出的pandas数据帧,现在是一个numpy.array对象。

subset = array[:4,:]
array([[  2.        ,  12.        ,  33.33333333,   2.        ,
         33.33333333,  12.        ],
       [  2.        ,   2.        ,  33.33333333,   2.        ,
         33.33333333,   2.        ],
       [  2.8       ,   8.        ,  45.83333333,   2.75      ,
         46.66666667,  13.        ],
       [  3.11320755,  75.        ,  56.        ,   3.24      ,
         52.83018868,  33.        ]])
print subset.dtype
dtype('float64')

我是要将列值转换为特定类型,并设置列名,这意味着我需要将其转换为ndarray。

这是我的dtypes:

[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<f8'), 
('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8'),
('NULL_COUNT_B', '<f8')]

当我转换数组时,我得到:

 ValueError: new type not compatible with array.

如何将每列转换为特定值,以便将数组转换为ndarray?

由于

1 个答案:

答案 0 :(得分:2)

您已经拥有ndarray。您正在寻找的是一个结构化数组,一个具有此化合物dtype。首先看看pandas能否为您做到这一点。如果失败了,我们可能会对tolist和列表理解做一些事情。

In [84]: dt=[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<
    ...: f8'), 
    ...: ('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8'),
    ...: ('NULL_COUNT_B', '<f8')]
In [85]: subset=np.array([[  2.        ,  12.        ,  33.33333333,   2.       
    ...:  ,
    ...:          33.33333333,  12.        ],
    ...:        [  2.        ,   2.        ,  33.33333333,   2.        ,
    ...:          33.33333333,   2.        ],
    ...:        [  2.8       ,   8.        ,  45.83333333,   2.75      ,
    ...:          46.66666667,  13.        ],
    ...:        [  3.11320755,  75.        ,  56.        ,   3.24      ,
    ...:          52.83018868,  33.        ]])
In [86]: subset
Out[86]: 
array([[  2.        ,  12.        ,  33.33333333,   2.        ,
         33.33333333,  12.        ],
       [  2.        ,   2.        ,  33.33333333,   2.        ,
         33.33333333,   2.        ],
       [  2.8       ,   8.        ,  45.83333333,   2.75      ,
         46.66666667,  13.        ],
       [  3.11320755,  75.        ,  56.        ,   3.24      ,
         52.83018868,  33.        ]])

现在使用dt创建一个数组。结构化数组的输入必须是元组列表 - 所以我使用tolist和列表理解

In [87]: np.array([tuple(row) for row in subset.tolist()],dtype=dt)
....
ValueError: field 'NULL_COUNT_B' occurs more than once
In [88]: subset.shape
Out[88]: (4, 6)
In [89]: dt
Out[89]: 
[('PERCENT_A_NEW', '<f8'),
 ('JoinField', '<i4'),
 ('NULL_COUNT_B', '<f8'),
 ('PERCENT_COMP_B', '<f8'),
 ('RANKING_A', '<f8'),
 ('RANKING_B', '<f8'),
 ('NULL_COUNT_B', '<f8')]
In [90]: dt=[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<
    ...: f8'), 
    ...: ('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8')]
In [91]: np.array([tuple(row) for row in subset.tolist()],dtype=dt)
Out[91]: 
array([(2.0, 12, 33.33333333, 2.0, 33.33333333, 12.0),
       (2.0, 2, 33.33333333, 2.0, 33.33333333, 2.0),
       (2.8, 8, 45.83333333, 2.75, 46.66666667, 13.0),
       (3.11320755, 75, 56.0, 3.24, 52.83018868, 33.0)], 
      dtype=[('PERCENT_A_NEW', '<f8'), ('JoinField', '<i4'), ('NULL_COUNT_B', '<f8'), ('PERCENT_COMP_B', '<f8'), ('RANKING_A', '<f8'), ('RANKING_B', '<f8')])