Question

我有一个元组（x，y）列表，如：

l = [(2,1), (4,6), (3,1), (2,7), (7,10)]

现在我要创建一个新列表：

l = [(2.5,1), (4,6), (2,7), (7,10)]

新列表具有元组的第一个值（x）的平均值，如果该元组中有多个具有相同的第二个值（y）的元组。

在这里，因为对于（x，y）=（2,1）和（3,1），元组y = 1中的第二个元素是常见的，因此x = 2和3的平均值在新列表中。 y = 1不会在其他任何地方出现，因此其他元组保持不变。

Answer 1

自从您标记了pandas：

l = [(2,1), (4,6), (3,1), (2,7), (7,10)]
df = pd.DataFrame(l)

然后df是一个具有两列的数据框：

现在您要计算0列中数字的平均值，并与1列中的值相同：

(df.groupby(1).mean()     # compute mean on each group
   .reset_index()[[0,1]]  # restore the column order
   .values                # return the underlying numpy array
 )

输出：

array([[ 2.5,  1. ],
       [ 4. ,  6. ],
       [ 2. ,  7. ],
       [ 7. , 10. ]])

Answer 2

首先将所有第二个元素作为键并将其对应的值作为值列表形成哈希表/字典。然后，使用listcomp可以遍历字典的各项来获得所需的输出。

from collections import defaultdict
out = defaultdict(list)
for i in l:
    out[i[1]] += [i[0]]
out = [(sum(v)/len(v), k) for k, v in out.items()]
print(out)
#prints [(2.5, 1), (4.0, 6), (2.0, 7), (7.0, 10)]

Answer 3

使用groupby的另一种方法：

from itertools import groupby

# Sort list by the second element
sorted_list = sorted(l,key=lambda x:x[1])

# Group by second element
grouped_list = groupby(sorted_list, key=lambda x:x[1])

result = []
for _,group in grouped_list:
    x,y = list(zip(*group))
    # Take the mean of the first elements
    result.append((sum(x) / len(x),y[0]))

您得到：

[(2.5, 1), (4.0, 6), (2.0, 7), (7.0, 10)]

Answer 4

这是使用numpy.bincount的方法。它依赖于标签为非负整数。（如果不是这种情况，则可以先进行np.unique(i, return_inverse=True)。）

w,i = zip(*l)
n,d = np.bincount(i,w), np.bincount(i)
v, = np.where(d)
[*zip(n[v]/d[v],v)]
# [(2.5, 1), (4.0, 6), (2.0, 7), (7.0, 10)]

按第二个元素分组元组列表，取第一个元素的平均值

4 个答案: