在4d数组中对行进行numpy排序并求和相同的值

时间:2018-10-16 18:03:11

标签: python arrays numpy

我有一个看起来像这样的数组

4.9 6.14923e-01 -4.7827e-01 -6.8341e+00
1.2 -4.7827e-01 -3.4162e-01 -7.9249e+00
3.4 -4.7827e-01 -6.1492e-01 -6.8341e+00
6.8 -4.7827e-01 -4.7827e-01 -7.4221e+00
5.2 6.14923e-01 -4.7827e-01 -6.8341e+00
1.4 -4.7827e-01 -3.4162e-01 -7.9249e+00
2.6 -4.7827e-01 -3.4162e-01 -6.9302e+00
2.8 -4.7827e-01 -6.1492e-01 -6.8341e+00
5.6 -4.7827e-01 -3.4162e-01 -6.9302e+00
4.1 -4.7827e-01 -4.7827e-01 -7.4221e+00
2.2 -4.7827e-01 -3.4162e-01 -6.9302e+00

最后三列是坐标(x,y,z)。

因此,我基本上想对每个重复值ox x,y和z求和,即第一列中的值。

排序后的输出如下:

2.8 -4.7827e-01 -6.1492e-01 -6.8341e+00
3.4 -4.7827e-01 -6.1492e-01 -6.8341e+00
6.8 -4.7827e-01 -4.7827e-01 -7.4221e+00
4.1 -4.7827e-01 -4.7827e-01 -7.4221e+00
1.2 -4.7827e-01 -3.4162e-01 -7.9249e+00
1.4 -4.7827e-01 -3.4162e-01 -7.9249e+00
2.6 -4.7827e-01 -3.4162e-01 -6.9302e+00
5.6 -4.7827e-01 -3.4162e-01 -6.9302e+00
2.2 -4.7827e-01 -3.4162e-01 -6.9302e+00
5.2 6.14923e-01 -4.7827e-01 -6.8341e+00
4.9 6.14923e-01 -4.7827e-01 -6.8341e+00

,然后将每个唯一值的第一列相加

6.2  -4.7827e-01 -6.1492e-01 -6.8341e+00
10.9 -4.7827e-01 -4.7827e-01 -7.4221e+00
2.6  -4.7827e-01 -3.4162e-01 -7.9249e+00
10.4 -4.7827e-01 -3.4162e-01 -6.9302e+00
10.1 6.14923e-01 -4.7827e-01 -6.8341e+00

3 个答案:

答案 0 :(得分:2)

您可以使用groupby实现类似scipy.sparse.csr_matrix的行为。但是,这需要做一些工作,因为稀疏将无法很好地处理您要分组的三列。

但是,我们可以使用np.unique返回唯一值以及相反的值,这样我们就可以将三列转换为1D数组,同时仍然保存多列以重新添加最后:


from scipy import sparse

v, bins = np.unique(a[:, 1:], axis=0, return_inverse=True)
vals = a[:, 0]

out = sparse.csr_matrix(
    (vals, bins, np.arange(vals.shape[0]+1)), (vals.shape[0], bins.max()+1)
).sum(0).A1

np.column_stack((out, v))

array([[ 6.2     , -0.47827 , -0.61492 , -6.8341  ],
       [10.9     , -0.47827 , -0.47827 , -7.4221  ],
       [ 2.6     , -0.47827 , -0.34162 , -7.9249  ],
       [10.4     , -0.47827 , -0.34162 , -6.9302  ],
       [10.1     ,  0.614923, -0.47827 , -6.8341  ]])

答案 1 :(得分:1)

使用pandas可以很容易地解决这种问题,如果您想使用纯python或numpy解决方案,则代码消耗更大。我建议:

import pandas as pd

df = pd.DataFrame(arr, columns=['A','X','Y','Z'])

new_df = df.groupby(['X','Y','Z'],as_index=False).sum()

new_arr = new_df[['A','X','Y','Z']].values

>>> new_arr
array([[ 6.2     , -0.47827 , -0.61492 , -6.8341  ],
       [10.9     , -0.47827 , -0.47827 , -7.4221  ],
       [ 2.6     , -0.47827 , -0.34162 , -7.9249  ],
       [10.4     , -0.47827 , -0.34162 , -6.9302  ],
       [10.1     ,  0.614923, -0.47827 , -6.8341  ]])

# All in one line, without saving intermediate steps to memory:

# new_arr pd.DataFrame(arr).groupby([1,2,3],as_index=False).sum()[[0,1,2,3]].values

答案 2 :(得分:1)

import numpy as np
import pandas as pd
a=[[4.9, 6.14923e-01, -4.7827e-01, -6.8341e+00],
[1.2, -4.7827e-01, -3.4162e-01 ,-7.9249e+00],
[3.4, -4.7827e-01, -6.1492e-01, -6.8341e+00],
[6.8, -4.7827e-01, -4.7827e-01, -7.4221e+00],
[5.2, 6.14923e-01, -4.7827e-01, -6.8341e+00],
[1.4, -4.7827e-01, -3.4162e-01, -7.9249e+00],
[2.6, -4.7827e-01, -3.4162e-01, -6.9302e+00],
[2.8, -4.7827e-01, -6.1492e-01, -6.8341e+00],
[5.6, -4.7827e-01, -3.4162e-01, -6.9302e+00],
[4.1, -4.7827e-01, -4.7827e-01, -7.4221e+00],
[2.2, -4.7827e-01, -3.4162e-01, -6.9302e+00]]
a=np.array(a)
df=pd.DataFrame(a)
df['sum']=df.groupby([1,2,3])[0].transform('sum')
df.drop_duplicates(subset=[1,2,3])[[1,2,3,'sum']]