Python:根据前两个内部列表值删除列表重复项

时间:2019-12-07 19:20:54

标签: python python-3.x processing-efficiency

问题:

我有以下格式的列表:

x = [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]

算法:

  • 将所有内部列表两个相同的起始值组合在一起,第三个值不必相同即可组合
    • 例如"hello",0,5 "hello",0,8
    • 组合在一起
    • 结合"hello",1,1
  • 第三个值成为第三个值的平均值:sum(all 3rd vals) / len(all 3rd vals)
    • 注意:all 3rd vals是指每个内部重复列表的第3个值
    • 例如"hello",0,5"hello",0,8成为hello,0,6.5

所需的输出:(列表的顺序无关紧要)

x = [["hello",0,6.5], ["hi",0,6], ["hello",1,1]]

问题:

  • 如何在Python中实现此算法?

理想情况下,这样做会很有效,因为它将用于非常大的列表。

如果不清楚,请告诉我,我会解释。

编辑:我试图将列表更改为一个集合以删除重复项,但是这并未考虑内部列表中的第三个变量,因此不起作用。

解决方案性能:

  

感谢为这个问题提供解决方案的每个人!这里   是基于对所有功能进行速度测试的结果:

Performance Data

4 个答案:

答案 0 :(得分:2)

使用运行总和进行计数

我想出了如何改善以前的代码(请参见下面的原始内容)。您可以保持连续的总数和计数,然后在最后计算平均值,从而避免记录所有单个数字。

from collections import defaultdict

class RunningAverage:
    def __init__(self):
        self.total = 0
        self.count = 0

    def add(self, value):
        self.total += value
        self.count += 1

    def calculate(self):
        return self.total / self.count

def func(lst):
    thirds = defaultdict(RunningAverage)
    for sub in lst:
        k = tuple(sub[:2])
        thirds[k].add(sub[2])
    lst_out = [[*k, v.calculate()] for k, v in thirds.items()]
    return lst_out

print(func(x))  # -> [['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]

原始答案

这可能不是很有效,因为它必须累加所有值以求平均值。我认为您可以通过添加加权平均值来实现此目的,但是我不确定如何做到这一点。

from collections import defaultdict

def avg(nums):
    return sum(nums) / len(nums)

def func(lst):
    thirds = defaultdict(list)
    for sub in lst:
        k = tuple(sub[:2])
        thirds[k].append(sub[2])
    lst_out = [[*k, avg(v)] for k, v in thirds.items()]
    return lst_out

print(func(x))  # -> [['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]

答案 1 :(得分:2)

您可以尝试使用groupby

m = [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
from itertools import groupby
m.sort(key=lambda x:x[0]+str(x[1]))

for i,j in groupby(m, lambda x:x[0]+str(x[1])):
    ss=0
    c=0.0
    for k in j:
        ss+=k[2]
        c+=1.0
    print [k[0], k[1], ss/c]

答案 2 :(得分:2)

这应该是O(N),如果我错了,有人可以纠正我:

def my_algorithm(input_list):
    """
    :param input_list: list of lists in format [string, int, int]
    :return: list
    """

    # Dict in format (string, int): [int, count_int]
    # So our list is in this format, example:
    # [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
    # so for our dict we will make keys a tuple of the first 2 values of each sublist (since that needs to be unique)
    # while values are a list of third element from our sublist + counter (which counts every time we have a duplicate
    # key, so we can divide it and get average).
    my_dict = {}
    for element in input_list:
        # key is a tuple of the first 2 values of each sublist
        key = (element[0], element[1])
        if key not in my_dict:
            # If the key do not exists add it.
            # Value is in form of third element from our sublist + counter. Since this is first value set counter to 1
            my_dict[key] = [element[2], 1]
        else:
            # If key does exist then increment our value and increment counter by 1
            my_dict[key][0] += element[2]
            my_dict[key][1] += 1

    # we have a dict so we will need to convert it to list (and on the way calculate averages)
    return _convert_my_dict_to_list(my_dict)


def _convert_my_dict_to_list(my_dict):
    """
    :param my_dict: dict, key is in form of tuple (string, int) and values are in form of list [int, int_counter]
    :return: list
    """
    my_list = []
    for key, value in my_dict.items():
        sublist = [key[0], key[1], value[0]/value[1]]
        my_list.append(sublist)
    return my_list

my_algorithm(x)

这将返回:

[['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]

您的预期回报是:

[["hello", 0, 6.5], ["hi", 0, 6], ["hello", 1, 1]]

如果您确实需要整数,则可以修改_convert_my_dict_to_list函数。

答案 3 :(得分:2)

这是我在此主题上的另一种变化:groupby不包含昂贵的sort。我还更改了问题,使输入和输出为元组的列表,因为这些是固定大小的记录:

from itertools import groupby
from operator import itemgetter
from collections import defaultdict

data = [("hello", 0, 5), ("hi", 0, 6), ("hello", 0, 8), ("hello", 1, 1)]

dictionary = defaultdict(complex)

for key, group in groupby(data, itemgetter(slice(2))):
    total = sum(value for (string, number, value) in group)
    dictionary[key] += total + 1j

array = [(*key, value.real / value.imag) for key, value in dictionary.items()]

print(array)

输出

> python3 test.py
[('hello', 0, 6.5), ('hi', 0, 6.0), ('hello', 1, 1.0)]
>

感谢@wjandrea用itemgetter代替lambda。 (是的,我 am 使用complex数字作为平均值来跟踪总数和计数。)