我有一个看起来像这样的数组
4.9 6.14923e-01 -4.7827e-01 -6.8341e+00
1.2 -4.7827e-01 -3.4162e-01 -7.9249e+00
3.4 -4.7827e-01 -6.1492e-01 -6.8341e+00
6.8 -4.7827e-01 -4.7827e-01 -7.4221e+00
5.2 6.14923e-01 -4.7827e-01 -6.8341e+00
1.4 -4.7827e-01 -3.4162e-01 -7.9249e+00
2.6 -4.7827e-01 -3.4162e-01 -6.9302e+00
2.8 -4.7827e-01 -6.1492e-01 -6.8341e+00
5.6 -4.7827e-01 -3.4162e-01 -6.9302e+00
4.1 -4.7827e-01 -4.7827e-01 -7.4221e+00
2.2 -4.7827e-01 -3.4162e-01 -6.9302e+00
最后三列是坐标(x,y,z)。
因此,我基本上想对每个重复值ox x,y和z求和,即第一列中的值。
排序后的输出如下:
2.8 -4.7827e-01 -6.1492e-01 -6.8341e+00
3.4 -4.7827e-01 -6.1492e-01 -6.8341e+00
6.8 -4.7827e-01 -4.7827e-01 -7.4221e+00
4.1 -4.7827e-01 -4.7827e-01 -7.4221e+00
1.2 -4.7827e-01 -3.4162e-01 -7.9249e+00
1.4 -4.7827e-01 -3.4162e-01 -7.9249e+00
2.6 -4.7827e-01 -3.4162e-01 -6.9302e+00
5.6 -4.7827e-01 -3.4162e-01 -6.9302e+00
2.2 -4.7827e-01 -3.4162e-01 -6.9302e+00
5.2 6.14923e-01 -4.7827e-01 -6.8341e+00
4.9 6.14923e-01 -4.7827e-01 -6.8341e+00
,然后将每个唯一值的第一列相加
6.2 -4.7827e-01 -6.1492e-01 -6.8341e+00
10.9 -4.7827e-01 -4.7827e-01 -7.4221e+00
2.6 -4.7827e-01 -3.4162e-01 -7.9249e+00
10.4 -4.7827e-01 -3.4162e-01 -6.9302e+00
10.1 6.14923e-01 -4.7827e-01 -6.8341e+00
答案 0 :(得分:2)
您可以使用groupby
实现类似scipy.sparse.csr_matrix
的行为。但是,这需要做一些工作,因为稀疏将无法很好地处理您要分组的三列。
但是,我们可以使用np.unique
返回唯一值以及相反的值,这样我们就可以将三列转换为1D
数组,同时仍然保存多列以重新添加最后:
from scipy import sparse
v, bins = np.unique(a[:, 1:], axis=0, return_inverse=True)
vals = a[:, 0]
out = sparse.csr_matrix(
(vals, bins, np.arange(vals.shape[0]+1)), (vals.shape[0], bins.max()+1)
).sum(0).A1
np.column_stack((out, v))
array([[ 6.2 , -0.47827 , -0.61492 , -6.8341 ],
[10.9 , -0.47827 , -0.47827 , -7.4221 ],
[ 2.6 , -0.47827 , -0.34162 , -7.9249 ],
[10.4 , -0.47827 , -0.34162 , -6.9302 ],
[10.1 , 0.614923, -0.47827 , -6.8341 ]])
答案 1 :(得分:1)
使用pandas
可以很容易地解决这种问题,如果您想使用纯python或numpy
解决方案,则代码消耗更大。我建议:
import pandas as pd
df = pd.DataFrame(arr, columns=['A','X','Y','Z'])
new_df = df.groupby(['X','Y','Z'],as_index=False).sum()
new_arr = new_df[['A','X','Y','Z']].values
>>> new_arr
array([[ 6.2 , -0.47827 , -0.61492 , -6.8341 ],
[10.9 , -0.47827 , -0.47827 , -7.4221 ],
[ 2.6 , -0.47827 , -0.34162 , -7.9249 ],
[10.4 , -0.47827 , -0.34162 , -6.9302 ],
[10.1 , 0.614923, -0.47827 , -6.8341 ]])
# All in one line, without saving intermediate steps to memory:
# new_arr pd.DataFrame(arr).groupby([1,2,3],as_index=False).sum()[[0,1,2,3]].values
答案 2 :(得分:1)
import numpy as np
import pandas as pd
a=[[4.9, 6.14923e-01, -4.7827e-01, -6.8341e+00],
[1.2, -4.7827e-01, -3.4162e-01 ,-7.9249e+00],
[3.4, -4.7827e-01, -6.1492e-01, -6.8341e+00],
[6.8, -4.7827e-01, -4.7827e-01, -7.4221e+00],
[5.2, 6.14923e-01, -4.7827e-01, -6.8341e+00],
[1.4, -4.7827e-01, -3.4162e-01, -7.9249e+00],
[2.6, -4.7827e-01, -3.4162e-01, -6.9302e+00],
[2.8, -4.7827e-01, -6.1492e-01, -6.8341e+00],
[5.6, -4.7827e-01, -3.4162e-01, -6.9302e+00],
[4.1, -4.7827e-01, -4.7827e-01, -7.4221e+00],
[2.2, -4.7827e-01, -3.4162e-01, -6.9302e+00]]
a=np.array(a)
df=pd.DataFrame(a)
df['sum']=df.groupby([1,2,3])[0].transform('sum')
df.drop_duplicates(subset=[1,2,3])[[1,2,3,'sum']]