假设NOTICE: UNDEFINED VARIABLE: MENU
数组的维和形状相同,如何基于a
数组的唯一值来求和b
数组的总和?
换句话说,对于数组b
的每个值,我希望有一个包含数组a
的总和的输出。 (在下面的示例中:值1的总和= xxx,值2的总和= yyy ...值11的总和= zzz)
a = [[ 5 1 10 11 6]
[ 5 3 8 10 9]
[ 2 1 10 8 7]
[ 7 10 7 8 11]
[10 10 3 0 11]]
b = [[508 220 316 557 737]
[625 419 161 736 426]
[389 608 760 885 232]
[396 309 522 204 842]
[403 831 225 549 797]]
答案 0 :(得分:2)
您可以使用numpy
:
import numpy as np
a = np.array(
[[ 5, 1, 10, 11, 6],
[ 5, 3, 8, 10, 9],
[ 2, 1, 10, 8, 7],
[ 7, 10, 7, 8, 11],
[10, 10, 3, 0, 11]])
b = np.array(
[[508, 220, 316, 557, 737],
[625, 419, 161, 736, 426],
[389, 608, 760, 885, 232],
[396, 309, 522, 204, 842],
[403, 831, 225, 549, 797]])
values = np.unique(a)
# will be [ 0 1 2 3 5 6 7 8 9 10 11]
out = {}
for value in values:
out[value] = sum(b[np.where(a==value)])
print(out)
# {0: 549, 1: 828, 2: 389, 3: 644, 5: 1133, 6: 737, 7: 1150, 8: 1250, 9: 426, 10: 3355, 11: 2196}
或具有dict理解,全部一行:
out = {value: sum(b[np.where(a==value)]) for value in np.unique(a)}
答案 1 :(得分:1)
或手动:
from itertools import chain
from collections import defaultdict
a = [[ 5, 1, 10, 11, 6],
[ 5, 3, 8, 10, 9],
[ 2, 1, 10, 8, 7],
[ 7, 10, 7, 8, 11],
[10, 10, 3, 0, 11]]
b = [[508, 220, 316, 557, 737],
[625, 419, 161, 736, 426],
[389, 608, 760, 885, 232],
[396, 309, 522, 204, 842],
[403, 831, 225, 549, 797]]
result = defaultdict(int)
for aa, bb in zip(chain(*a), chain(*b)):
result[aa] += bb
print(result)
#defaultdict(<class 'int'>, {5: 1133, 1: 828, 10: 3355, 11: 2196, 6: 737, 3: 644, 8: 1250, 9: 426, 2: 389, 7: 1150, 0: 549})
答案 2 :(得分:1)
熊猫是直接有效的方法:
df=pd.DataFrame(data=b.ravel(),index=a.ravel())
sums=df.groupby(level=0).sum()
# 0
# 0 549
# 1 828
# 2 389
# 3 644
# 5 1133
# 6 737
# 7 1150
# 8 1250
# 9 426
# 10 3355
# 11 2196
基准:
a=np.random.randint(0,10**4,size=10**5)
b=np.random.randint(0,10**6,size=10**5)
In [19]: %timeit pd.DataFrame(b,a).groupby(level=0).sum()
58.7 ms ± 12.6 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [20]: %timeit for aa, bb in zip(a,b):result[aa] += bb
223 ms ± 36.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
In [21]: %timeit for value in np.unique(a): out[value] = np.sum(b[np.where(a==value)])
5.67 s ± 933 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)