我有以下循环
# `results` are obtained from some mySQldb command.
for row in results:
print row
打印这样的元组:
('1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0)
('1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031, 4.41336e-06, 0.522107)
('1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757, 1.28505e-12, 0.480883)
('1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0, 0.307837)
我的问题是从那次迭代中我怎么能创建一个看起来像这样的凹凸不平的nd.array:
array([['1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0],
['1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031, 4.41336e-06, 0.522107],
['1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757, 1.28505e-12, 0.480883],
['1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0, 0.307837]])
最后,ndarray将具有形状:(4,8)
答案 0 :(得分:2)
将其读入结构化数组:
In [30]:
a=[('1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0),
('1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031, 4.41336e-06, 0.522107),
('1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757, 1.28505e-12, 0.480883),
('1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0, 0.307837)]
np.array(a, dtype=('a10,a10,f4,f4,f4,f4,f4,f4'))
Out[30]:
array([('1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0),
('1A9N', 'RBP', 0.045626699924468994, 0.053926799446344376, 0.331932008266449, 0.04640309885144234, 4.413359874888556e-06, 0.5221070051193237),
('1AQ3', 'RBP', 0.044447898864746094, 0.20111200213432312, 0.26858100295066833, 0.004975699819624424, 1.2850499744171406e-12, 0.48088300228118896),
('1AQ4', 'RBP', 0.01772320084273815, 0.3637459874153137, 0.30899500846862793, 0.0016986100235953927, 0.0, 0.30783700942993164)],
dtype=[('f0', 'S10'), ('f1', 'S10'), ('f2', '<f4'), ('f3', '<f4'), ('f4', '<f4'), ('f5', '<f4'), ('f6', '<f4'), ('f7', '<f4')])
您可以在object
dtype
中使用所有这些内容:
In [46]:
np.array(a, dtype=object)
Out[46]:
array([['1A34', 'RBP', 0.0, 1.0, 0.0, 0.0, 0.0, 0.0],
['1A9N', 'RBP', 0.0456267, 0.0539268, 0.331932, 0.0464031,
4.41336e-06, 0.522107],
['1AQ3', 'RBP', 0.0444479, 0.201112, 0.268581, 0.0049757,
1.28505e-12, 0.480883],
['1AQ4', 'RBP', 0.0177232, 0.363746, 0.308995, 0.00169861, 0.0,
0.307837]], dtype=object)
但它不适合float
值,也可能会导致不良行为:
In [48]:
b=np.array(a, dtype=object)
b[0]+b[1] #addition for float values and concatenation for string values
Out[48]:
array(['1A341A9N', 'RBPRBP', 0.0456267, 1.0539268, 0.331932, 0.0464031,
4.41336e-06, 0.522107], dtype=object)
pandas
也是另一种选择:
In [43]:
import pandas as pd
print pd.DataFrame(a)
0 1 2 3 4 5 6 7
0 1A34 RBP 0.000000 1.000000 0.000000 0.000000 0.000000e+00 0.000000
1 1A9N RBP 0.045627 0.053927 0.331932 0.046403 4.413360e-06 0.522107
2 1AQ3 RBP 0.044448 0.201112 0.268581 0.004976 1.285050e-12 0.480883
3 1AQ4 RBP 0.017723 0.363746 0.308995 0.001699 0.000000e+00 0.307837
In [44]:
pd.DataFrame(a).dtypes
Out[44]:
0 object
1 object
2 float64
3 float64
4 float64
5 float64
6 float64
7 float64
dtype: object
它允许列具有不同的dtype