我有一个numpy数组:
[[1,521,3],
[2,543,2],
[3,555,3],
[4,575,2]]
在熊猫中它看起来像这样:
Seconds Price Type
1 521 3
2 543 2
3 555 3
4 575 2
然后我为它设置索引:
types = df.T.unique()
df.set_index(['Type','Seconds'], inplace=True)
输出:
Price
Type Seconds
3 1 521
3 3 555
2 2 543
2 4 575
然后我重新编制索引,为每种类型设置每秒:
for i in types:
df1 = df.xs(i, level=0).reindex([1,2,3,4], fill_value=0).reset_index()
df['Type'] = i
df.set_index(['Type', 'Seconds'], inplace=True)
输出:
Price
Type Seconds
3 1 521
3 2 0
3 3 555
3 4 0
2 1 0
2 2 543
2 3 0
2 4 575
在熊猫中很容易做到。如何在numpy中做到这一点? 它应该看起来像:
df.values
答案 0 :(得分:0)
这是您可以使用的一种方法。
import numpy as np
ar = np.array([[1,521,3], [2,543,2], [3,555,3], [4,575,2]])
ar
Out[50]:
array([[ 1, 521, 3],
[ 2, 543, 2],
[ 3, 555, 3],
[ 4, 575, 2]])
确定您的扩展索引:
u0 = np.unique(ar[:, 0])
u2 = np.unique(ar[:, 2])
rowcount = u0.shape[0]*u2.shape[0]
rows = np.stack([np.repeat(u2, rowcount//u2.shape[0]),
np.tile(u0, rowcount//u0.shape[0])],
1)
rows
Out[51]:
array([[2, 1],
[2, 2],
[2, 3],
[2, 4],
[3, 1],
[3, 2],
[3, 3],
[3, 4]])
弄清楚你的阵列中没有的东西:
row_index = np.sort(np.unique(np.concatenate([ar[:, [2, 0]], rows]),
return_index=True, axis=0)[1])
missing = rows[row_index[ar.shape[0]:]-ar.shape[0]]
missing
Out[52]:
array([[2, 1],
[2, 3],
[3, 2],
[3, 4]])
然后合并:
reindexed = np.zeros((rowcount, ar.shape[1]), int)
reindexed[:ar.shape[0], [1, 2, 0]] = ar
reindexed[ar.shape[0]:, [0, 1]] = missing
reindexed
Out[53]:
array([[ 3, 1, 521],
[ 2, 2, 543],
[ 3, 3, 555],
[ 2, 4, 575],
[ 2, 1, 0],
[ 2, 3, 0],
[ 3, 2, 0],
[ 3, 4, 0]])
如果需要,排序:
reindexed[np.lexsort([reindexed[:, 1], reindexed[:, 0]])]
Out[49]:
array([[ 2, 1, 0],
[ 2, 2, 543],
[ 2, 3, 0],
[ 2, 4, 575],
[ 3, 1, 521],
[ 3, 2, 0],
[ 3, 3, 555],
[ 3, 4, 0]])