问题：

Question

问题：

我有以下格式的列表：

x = [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]

算法：

将所有内部列表与两个相同的起始值组合在一起，第三个值不必相同即可组合
- 例如"hello",0,5 与"hello",0,8

第三个值成为第三个值的平均值：sum(all 3rd vals) / len(all 3rd vals)

注意：all 3rd vals是指每个内部重复列表的第3个值

例如"hello",0,5和"hello",0,8成为hello,0,6.5

所需的输出：（列表的顺序无关紧要）

x = [["hello",0,6.5], ["hi",0,6], ["hello",1,1]]

问题：

如何在Python中实现此算法？

理想情况下，这样做会很有效，因为它将用于非常大的列表。

如果不清楚，请告诉我，我会解释。

编辑：我试图将列表更改为一个集合以删除重复项，但是这并未考虑内部列表中的第三个变量，因此不起作用。

解决方案性能：

感谢为这个问题提供解决方案的每个人！这里是基于对所有功能进行速度测试的结果：

Answer 1

使用运行总和进行计数

我想出了如何改善以前的代码（请参见下面的原始内容）。您可以保持连续的总数和计数，然后在最后计算平均值，从而避免记录所有单个数字。

from collections import defaultdict

class RunningAverage:
    def __init__(self):
        self.total = 0
        self.count = 0

    def add(self, value):
        self.total += value
        self.count += 1

    def calculate(self):
        return self.total / self.count

def func(lst):
    thirds = defaultdict(RunningAverage)
    for sub in lst:
        k = tuple(sub[:2])
        thirds[k].add(sub[2])
    lst_out = [[*k, v.calculate()] for k, v in thirds.items()]
    return lst_out

print(func(x))  # -> [['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]

原始答案

这可能不是很有效，因为它必须累加所有值以求平均值。我认为您可以通过添加加权平均值来实现此目的，但是我不确定如何做到这一点。

from collections import defaultdict

def avg(nums):
    return sum(nums) / len(nums)

def func(lst):
    thirds = defaultdict(list)
    for sub in lst:
        k = tuple(sub[:2])
        thirds[k].append(sub[2])
    lst_out = [[*k, avg(v)] for k, v in thirds.items()]
    return lst_out

print(func(x))  # -> [['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]

Answer 2

您可以尝试使用groupby。

m = [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
from itertools import groupby
m.sort(key=lambda x:x[0]+str(x[1]))

for i,j in groupby(m, lambda x:x[0]+str(x[1])):
    ss=0
    c=0.0
    for k in j:
        ss+=k[2]
        c+=1.0
    print [k[0], k[1], ss/c]

Answer 3

这应该是O（N），如果我错了，有人可以纠正我：

def my_algorithm(input_list):
    """
    :param input_list: list of lists in format [string, int, int]
    :return: list
    """

    # Dict in format (string, int): [int, count_int]
    # So our list is in this format, example:
    # [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
    # so for our dict we will make keys a tuple of the first 2 values of each sublist (since that needs to be unique)
    # while values are a list of third element from our sublist + counter (which counts every time we have a duplicate
    # key, so we can divide it and get average).
    my_dict = {}
    for element in input_list:
        # key is a tuple of the first 2 values of each sublist
        key = (element[0], element[1])
        if key not in my_dict:
            # If the key do not exists add it.
            # Value is in form of third element from our sublist + counter. Since this is first value set counter to 1
            my_dict[key] = [element[2], 1]
        else:
            # If key does exist then increment our value and increment counter by 1
            my_dict[key][0] += element[2]
            my_dict[key][1] += 1

    # we have a dict so we will need to convert it to list (and on the way calculate averages)
    return _convert_my_dict_to_list(my_dict)


def _convert_my_dict_to_list(my_dict):
    """
    :param my_dict: dict, key is in form of tuple (string, int) and values are in form of list [int, int_counter]
    :return: list
    """
    my_list = []
    for key, value in my_dict.items():
        sublist = [key[0], key[1], value[0]/value[1]]
        my_list.append(sublist)
    return my_list

my_algorithm(x)

这将返回：

[['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]

您的预期回报是：

[["hello", 0, 6.5], ["hi", 0, 6], ["hello", 1, 1]]

如果您确实需要整数，则可以修改_convert_my_dict_to_list函数。

Answer 4

这是我在此主题上的另一种变化：groupby不包含昂贵的sort。我还更改了问题，使输入和输出为元组的列表，因为这些是固定大小的记录：

from itertools import groupby
from operator import itemgetter
from collections import defaultdict

data = [("hello", 0, 5), ("hi", 0, 6), ("hello", 0, 8), ("hello", 1, 1)]

dictionary = defaultdict(complex)

for key, group in groupby(data, itemgetter(slice(2))):
    total = sum(value for (string, number, value) in group)
    dictionary[key] += total + 1j

array = [(*key, value.real / value.imag) for key, value in dictionary.items()]

print(array)

输出

> python3 test.py
[('hello', 0, 6.5), ('hi', 0, 6.0), ('hello', 1, 1.0)]
>

感谢@wjandrea用itemgetter代替lambda。（是的，我 am 使用complex数字作为平均值来跟踪总数和计数。）

Python：根据前两个内部列表值删除列表重复项

问题：

解决方案性能：

4 个答案:

使用运行总和进行计数

原始答案