我编写了一个Python脚本,该脚本可以解析大量数据并按照以下示例进行清理:
input_list = [[a,b,c,0.5], [a,b,d,1], [a,b,e,1], [a,b,c,0.3], [a,b,c,0.2], [a,b,f,0.6], [a,b,f,0.4], [a,b,g,1]]
output_list = [[a,b,c,1], [a,b,d,1], [a,b,e,1], [a,b,f,1], [a,b,g,1]]
因此,基本上,如果每个列表的前3个元素相同,则仅保留一个列表,并将值(第4个元素)加起来。
我使用了嵌套的“ for”循环和许多“ if”语句,但我想知道是否有更好的方法可以在Python(最好是v2)中做到这一点。
我不是在这里要求代码,只是一些建议,以便我学习和改进代码。
干杯。
答案 0 :(得分:4)
input_list = [['a','b','c',0.5], ['a','b','d',1], ['a','b','e',1], ['a','b','c',0.3], ['a','b','c',0.2], ['a','b','f',0.6], ['a','b','f',0.4], ['a','b','g',1]]
output_list = []
d = {}
for i in input_list:
key = (i[0], i[1], i[2])
d[key] = i[3] + (d[key] if key in d else float(0))
for k, v in d.iteritems():
output_list.append([
k[0], k[1], k[2], v
])
# print output_list
答案 1 :(得分:1)
由于您不知道将匹配多少个元素,并且需要跟踪到目前为止为特定键找到的内容,因此使用dict作为中间数据类型很有意义。
这是一个可行的解决方案:
totals = {}
for a, b, c, x in input_list:
key = (a, b, c)
if key in totals:
totals[key] += x
else:
totals[key] = x
result = [[k[0], k[1], k[2], v] for k, v in totals.items()]
这是什么:
在Python 3中,最后一行会更好:
result = [[*k, v] for k, v in totals.items()]
答案 2 :(得分:1)
这是itertools.groupby的好用例,它同时适用于python2
和python3
。
我们基本上将所有元素与相同的前3个元素分组在一起,将所有此类元素的第4个元素相加,然后创建结果列表
from itertools import groupby
input_list = [['a','b','c',0.5], ['a','b','d',1], ['a','b','e',1], ['a','b','c',0.3], ['a','b','c',0.2], ['a','b','f',0.6], ['a','b','f',0.4], ['a','b','g',1]]
#Sort the input list based on first three elements
input_list = sorted(input_list, key=lambda x:x[:3])
res = []
#Group the input list based on first three elements
for model, group in groupby(input_list, key=lambda x:x[:3]):
#Sum up the 4th element for the same first 3 elements and cast to int
fourth_val = int(sum([item[3] for item in group]))
#Create the list by adding the common first 3 elements with the sum
res.append(model+[fourth_val])
print(res)
输出将为
[['a', 'b', 'c', 1], ['a', 'b', 'd', 1],
['a', 'b', 'e', 1], ['a', 'b', 'f', 1],
['a', 'b', 'g', 1]]
另一种方法是使用字典,其键为列表的前3个元素,并对第4个元素的值求和
input_list = [['a','b','c',0.5], ['a','b','d',1], ['a','b','e',1], ['a','b','c',0.3], ['a','b','c',0.2], ['a','b','f',0.6], ['a','b','f',0.4], ['a','b','g',1]]
dct = {}
#Iterate through input list
for x,y,z, a in input_list:
#Take the first 3 elements as the key
k = x,y,z
#Add up 4th value for common first 3 elements
dct.setdefault(k,0)
dct[k]= a+dct[k]
#Convert dictionary back to list
res = [ [x,y,z,int(v)] for (x,y,z), v in dct.items()]
print(res)
答案 3 :(得分:0)
我想我为您找到了答案
input_list = [["a", "b", "c", 0.5], ["a", "b", "d", 1], ["a", "b", "e", 1],
["a", "b", "c", 0.3], ["a", "b", "c", 0.2], ["a", "b", "f", 0.6],
["a", "b", "f", 0.4], ["a", "b", "g", 1]]
output_list = []
for i in range(len(input_list)):
letters = [x[0:3] for x in output_list]
if input_list[i][0:3] in letters:
used = letters.index(input_list[i][0:3])
output_list[used][3] += input_list[i][3]
else:
output_list.append(input_list[i])
print(output_list)
输出:
[['a', 'b', 'c', 1.0], ['a', 'b', 'd', 1], ['a', 'b', 'e', 1], ['a', 'b', 'f', 1.0], ['a', 'b', 'g', 1]]
答案 4 :(得分:0)
这样的事情怎么样?
def foo(input_list):
seen = {}
for x1, x2, x3, x4 in input_list:
seen[(x1, x2, x3)] = seen.setdefault((x1, x2, x3), 0) + x4
return [[x1, x2, x3, x4] for (x1, x2, x3), x4 in seen.items()]
您可以在现代python中解压缩类似*x, y = 1,2,3,4
的值,但我认为旧版python不具有此功能。