Question

我有一个如下所示的数组：

async makeCalls() {
try {
  let response1 = await this.http.get(firstEndpointUrl).toPromise();
  let secondEndpointUrl = 'someUrl/' + response1.json().id;
  let response2 = await this.http.get(secondEndpointUrl).toPromise();
  let thirdEndpointUrl = 'someUrl/' + response2.json().id;
  let response3 = await this.http.get(thirdEndpointUrl).toPromise();
  return response3.json();
} catch (error) {
  // handle error
}

我想对第二列中具有相同值的第三列的值求和，所以结果是：

 array([[ 0,  1,  2],
        [ 1,  1,  6],
        [ 2,  2, 10],
        [ 3,  2, 14]])

我开始对此进行编码，但我坚持这笔钱：

 array([[ 0,  1,  8],
        [ 1,  2, 24]])

Answer 1

您可以使用pandas来对您的算法进行矢量化：

import pandas as pd, numpy as np

A = np.array([[ 0,  1,  2],
              [ 1,  1,  6],
              [ 2,  2, 10],
              [ 3,  2, 14]])

df = pd.DataFrame(A)\
       .groupby(1, as_index=False)\
       .sum()\
       .reset_index()

res = df[['index', 1, 2]].values

<强>结果

array([[ 0,  1,  8],
       [ 2,  2, 24]], dtype=int64)

Answer 2

如果您的数据按第二列排序，则可以使用以np.add。reduceat为中心的内容来获得纯粹的numpy解决方案。应用于np.nonzero的np.where（或np.diff）组合将为您提供第二列切换值的位置。您可以使用这些索引进行总和减少。其他列非常公式化，因此您可以相当容易地将它们连接起来：

A = np.array([[ 0,  1,  2],
              [ 1,  1,  6],
              [ 2,  2, 10],
              [ 3,  2, 14]])
# Find the split indices
i = np.nonzero(np.diff(A[:, 1]))[0] + 1
i = np.insert(i, 0, 0)
# Compute the result columns
c0 = np.arange(i.size)
c1 = A[i, 1]
c2 = np.add.reduceat(A[:, 2], i)
# Concatenate the columns
result = np.c_[c0, c1, c2]

IDEOne Link

注意索引中的+1。这是因为你总是希望在切换之后位置，而不是之前，考虑到reduceat的工作方式。作为第一个索引插入零也可以使用np.r_，np.concatenate等来完成。

话虽如此，我仍然认为您正在寻找@jpp's answer中的熊猫版本。

Answer 3

这是我的解决方案，只使用numpy数组......

import numpy as np
arr = np.array([[ 0,  1,  2], [ 1,  1,  6], [ 2,  2, 10], [ 3,  2, 14]])

lst = []
compt = 0
for index in range(1, max(arr[:, 1]) + 1):
    lst.append([compt, index, np.sum(arr[arr[:, 1] == index][:, 2])])
lst = np.array(lst)
print lst
# lst, outputs...
# [[ 0  1  8]
# [ 0  2 24]]

棘手的部分是np.sum(arr[arr[:, 1] == index][:, 2])，所以让我们将它分解为多个部分。

arr[arr[:, 1] == index]表示......

你有一个数组arr，在其上我们要求numpy与for循环的值匹配的行。这里，它从1设置为第二列的元素的最大值（意思是索引为1的列）。在for循环中打印仅此表达式会导致...

# First iteration
[[0 1 2]
 [1 1 6]]
# Second iteration
[[ 2  2 10]
 [ 3  2 14]]

将[:, 2]添加到我们的表达式中，这意味着我们需要上面列表中第3列（意思是索引2）的值。如果我打印arr[arr[:, 1] == index][:, 2]，它会在第一次迭代时给我[2, 6]，在第二次迭代时给我[10, 14]。
我只需要使用np.sum()对这些值求和，并相应地格式化我的输出列表。：）

Answer 4

使用字典存储值，然后转换回列表

x = [[ 0,  1,  2],
     [ 1,  1,  6],
     [ 2,  2, 10],
     [ 3,  2, 14]]

y = {}
for val in x:
    if val[1] in y:
        y[val[1]][2] += val[2]
    else:
        y.update({val[1]: val})
print([y[val] for val in y])

Answer 5

要获得准确的输出，请使用pandas：

import pandas as pd
import numpy as np

a = np.array([[ 0,  1,  2],
              [ 1,  1,  6],
              [ 2,  2, 10],
              [ 3,  2, 14]])

df = pd.DataFrame(a)
df.groupby(1).sum().reset_index().reset_index().as_matrix()
#[[ 0 1  8]
# [ 1 2 24]]

Answer 6

您还可以使用defaultdict并对值进行求和：

from collections import defaultdict

x = [[ 0,  1,  2],
    [ 1,  1,  6],
    [ 2,  2, 10]]

res = defaultdict(int)
for val in x:
    res[val[1]]+= val[2]
print ([[i, val,res[val]] for i, val in enumerate(res)])

Numpy数组：按一列分组，另外一列

6 个答案: