这是我的问题。
我有一份清单清单,如下:
linesort=[
['Me', 1, 596],
['Mine', 1, 551],
['Myself', 1, 533],
['Myself', 1, 624],
['Myself', 1, 656],
['Myself', 1, 928],
['Theirs', 1, 720],
['Theirs', 1, 1921],
['Them', 1, 716],
['Themselves', 1, 527]
]
每个子列表表示参与者对单词进行分类所需的时间,正确或不正确(第二个值)和响应时间(第三个值)。 我想要做的是返回另一个列表列表,其中包含单词,每个列表中第二个值的总和以及第三个值的平均值。
基本上,我需要比较每个子列表的第一个元素,如果它们相等,则计算第二个元素的总和和第三个元素的平均值。
虽然我能够手动执行此操作(即手动分配和创建变量),但我在循环中尝试这样做都失败了。鉴于我有两个相当大的文本文件与这种数据,我将感谢程序化的解决方案。
可能有用的一些要点:我事先知道每个测试中使用了哪些词,但我不知道它们会出现在哪里(即使它们出现在任何一组刺激中)。 任何人都可以帮我解决这个问题吗?
我在Ubuntu 10.04上使用Python 2.6.5。
答案 0 :(得分:3)
不是美丽的,但是:
from collections import defaultdict
linesort = [['Me', 1, 596], ['Mine', 1, 551], ['Myself', 1, 533], ['Myself', 1, 624],
['Myself', 1, 656], ['Myself', 1, 928], ['Theirs', 1, 720],
['Theirs', 1, 1921], ['Them', 1, 716], ['Themselves', 1, 527]]
d = defaultdict(list)
for line in linesort:
d[line[0]].append(line[1:])
output = {}
for x,val in d.items():
svals = [y[1] for y in val]
output[x] = [sum([y[0] for y in val]), sum(svals) / len(svals)] # need to be modified if you need float value
print output
>>> {'Mine': [1, 551], 'Theirs': [2, 1320], 'Me': [1, 596], 'Them': [1, 716], 'Themselves': [1, 527], 'Myself': [4, 685]}
或者使用groupby(请注意,它不是最有效且需要列表的初始数据要排序):
from itertools import groupby
res = {}
for key, gen in groupby(sorted(linesort), key=lambda x: x[0]):
val = list(gen)
svals = [y[2] for y in val]
res[key] = [sum([y[1] for y in val]), sum(svals) / float(len(svals))]
但我之前的所有示例都会返回一个字典,所以如果你想获得一个列表而不需要修改代码:
from itertools import groupby
res = []
for key, gen in groupby(sorted(linesort), key=lambda x: x[0]):
val = list(gen)
svals = [y[2] for y in val]
res.append([key, sum([y[1] for y in val]), sum(svals) / float(len(svals))])
print res
>>> [['Me', 1, 596.0], ['Mine', 1, 551.0], ['Myself', 4, 685.25], ['Theirs', 2, 1320.5], ['Them', 1, 716.0], ['Themselves', 1, 527.0]]
答案 1 :(得分:1)
我的详细解决方案
#!/usr/bin/env python
import collections
linesort=[['Me', 1, 596], ['Mine', 1, 551], ['Myself', 1, 533], ['Myself', 1, 624],
['Myself', 1, 656], ['Myself', 1, 928],['Theirs', 1, 720], ['Theirs', 1, 1921],
['Them', 1, 716], ['Themselves', 1, 527]]
new=[]
d=collections.defaultdict(list)
for i in linesort:
d[i[0]].append(i[1:])
for k,v in d.iteritems():
s=sum([i[0] for i in v])
avg=sum([i[1] for i in v]) / len(v)
new.append([k,s,avg])
for i in new: print i
输出:
['Me', 1, 596]
['Myself', 4, 685]
['Theirs', 2, 1320]
['Mine', 1, 551]
['Themselves', 1, 527]
['Them', 1, 716]
答案 2 :(得分:1)
这是我的简单解决方案:
#!/usr/bin/python
linesort=[['Me', 1, 596], ['Mine', 1, 551], ['Myself', 1, 533], ['Myself', 1, 624], ['Myself', 1, 656], ['Myself', 1, 928], ['Theirs', 1, 720], ['Theirs', 1, 1921], ['Them', 1, 716], ['Themselves', 1, 527]]
cnts = {};
sums = {};
# here we count occurrences of each word (cnts),
# and we compute the the sum of second elements of each input list
for list in linesort:
cnts[list[0]] = cnts.get(list[0], 0) + 1;
sums[list[0]] = sums.get(list[0], 0) + list[1];
# now that we know the occurrences for each work we can compute
# the averages of the third elements of each input list
avgs = {};
for list in linesort:
avgs[list[0]] = avgs.get(list[0], 0) + list[2] / cnts[list[0]];
# we build the result as a list of lists
result = [];
for word in avgs:
result.append([word, sums[word], avgs[word]]);
print result;
输出是:
[['Me', 1, 596], ['Myself', 4, 685], ['Theirs', 2, 1320], ['Mine', 1, 551], ['Themselves', 1, 527], ['Them', 1, 716]]