我有以下格式的列表:
x = [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
算法:
"hello",0,5
与"hello",0,8
"hello",1,1
sum(all 3rd vals) / len(all 3rd vals)
all 3rd vals
是指每个内部重复列表的第3个值"hello",0,5
和"hello",0,8
成为hello,0,6.5
所需的输出:(列表的顺序无关紧要)
x = [["hello",0,6.5], ["hi",0,6], ["hello",1,1]]
问题:
理想情况下,这样做会很有效,因为它将用于非常大的列表。
如果不清楚,请告诉我,我会解释。
编辑:我试图将列表更改为一个集合以删除重复项,但是这并未考虑内部列表中的第三个变量,因此不起作用。
感谢为这个问题提供解决方案的每个人!这里 是基于对所有功能进行速度测试的结果:
答案 0 :(得分:2)
我想出了如何改善以前的代码(请参见下面的原始内容)。您可以保持连续的总数和计数,然后在最后计算平均值,从而避免记录所有单个数字。
from collections import defaultdict
class RunningAverage:
def __init__(self):
self.total = 0
self.count = 0
def add(self, value):
self.total += value
self.count += 1
def calculate(self):
return self.total / self.count
def func(lst):
thirds = defaultdict(RunningAverage)
for sub in lst:
k = tuple(sub[:2])
thirds[k].add(sub[2])
lst_out = [[*k, v.calculate()] for k, v in thirds.items()]
return lst_out
print(func(x)) # -> [['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]
这可能不是很有效,因为它必须累加所有值以求平均值。我认为您可以通过添加加权平均值来实现此目的,但是我不确定如何做到这一点。
from collections import defaultdict
def avg(nums):
return sum(nums) / len(nums)
def func(lst):
thirds = defaultdict(list)
for sub in lst:
k = tuple(sub[:2])
thirds[k].append(sub[2])
lst_out = [[*k, avg(v)] for k, v in thirds.items()]
return lst_out
print(func(x)) # -> [['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]
答案 1 :(得分:2)
您可以尝试使用groupby
。
m = [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
from itertools import groupby
m.sort(key=lambda x:x[0]+str(x[1]))
for i,j in groupby(m, lambda x:x[0]+str(x[1])):
ss=0
c=0.0
for k in j:
ss+=k[2]
c+=1.0
print [k[0], k[1], ss/c]
答案 2 :(得分:2)
这应该是O(N),如果我错了,有人可以纠正我:
def my_algorithm(input_list):
"""
:param input_list: list of lists in format [string, int, int]
:return: list
"""
# Dict in format (string, int): [int, count_int]
# So our list is in this format, example:
# [["hello",0,5], ["hi",0,6], ["hello",0,8], ["hello",1,1]]
# so for our dict we will make keys a tuple of the first 2 values of each sublist (since that needs to be unique)
# while values are a list of third element from our sublist + counter (which counts every time we have a duplicate
# key, so we can divide it and get average).
my_dict = {}
for element in input_list:
# key is a tuple of the first 2 values of each sublist
key = (element[0], element[1])
if key not in my_dict:
# If the key do not exists add it.
# Value is in form of third element from our sublist + counter. Since this is first value set counter to 1
my_dict[key] = [element[2], 1]
else:
# If key does exist then increment our value and increment counter by 1
my_dict[key][0] += element[2]
my_dict[key][1] += 1
# we have a dict so we will need to convert it to list (and on the way calculate averages)
return _convert_my_dict_to_list(my_dict)
def _convert_my_dict_to_list(my_dict):
"""
:param my_dict: dict, key is in form of tuple (string, int) and values are in form of list [int, int_counter]
:return: list
"""
my_list = []
for key, value in my_dict.items():
sublist = [key[0], key[1], value[0]/value[1]]
my_list.append(sublist)
return my_list
my_algorithm(x)
这将返回:
[['hello', 0, 6.5], ['hi', 0, 6.0], ['hello', 1, 1.0]]
您的预期回报是:
[["hello", 0, 6.5], ["hi", 0, 6], ["hello", 1, 1]]
如果您确实需要整数,则可以修改_convert_my_dict_to_list
函数。
答案 3 :(得分:2)
这是我在此主题上的另一种变化:groupby
不包含昂贵的sort
。我还更改了问题,使输入和输出为元组的列表,因为这些是固定大小的记录:
from itertools import groupby
from operator import itemgetter
from collections import defaultdict
data = [("hello", 0, 5), ("hi", 0, 6), ("hello", 0, 8), ("hello", 1, 1)]
dictionary = defaultdict(complex)
for key, group in groupby(data, itemgetter(slice(2))):
total = sum(value for (string, number, value) in group)
dictionary[key] += total + 1j
array = [(*key, value.real / value.imag) for key, value in dictionary.items()]
print(array)
输出
> python3 test.py
[('hello', 0, 6.5), ('hi', 0, 6.0), ('hello', 1, 1.0)]
>
感谢@wjandrea用itemgetter
代替lambda
。 (是的,我 am 使用complex
数字作为平均值来跟踪总数和计数。)