对于这个问题,我正在处理一个大的列表,它是从CSV文件导入的,但是让我们说 我有一个这样的清单:
[['name','score1','score2''score3''score4']
['Mike','5','1','6','2']
['Mike','1','1','1','1']
['Mike','3','0','3','0']
['jose','0','1','2','3']
['jose','2','3','4','5']
['lisa','4','4','4','4']]
我希望有一个这个表单的另一个列表(每个学生的所有分数的总和):
[['Mike','9','2','10','3']
['jose','2','4','6','8']
['lisa','4','4','4','4']]
任何想法如何做到这一点? 我一直在尝试很多方法,但我无法做到。 当我在那里有两个以上相同的名字时,我被困住了,我的解决方案只保留最后两行添加。 我是python和编程的新手。
答案 0 :(得分:1)
如果您只是在学习Python,我总是建议您尝试在不依赖外部库的情况下实现。一个好的开始步骤是首先尝试将问题分解为更小的组件:
每个剩余的条目:
一种可能的实现如下(未经测试):
input_list = [['name','score1','score2''score3''score4'],
['Mike','5','1','6','2'],
['Mike','1','1','1','1'],
['Mike','3','0','3','0'],
['jose','0','1','2','3'],
['jose','2','3','4','5'],
['lisa','4','4','4','4']]
print input_list
# Remove the first element
input_list = input_list[1:]
# Initialize an empty output list
output_list = []
# Iterate through each entry in the input
for val in input_list:
# Determine if key is already in output list
for ent in output_list:
if ent[0] == val[0]:
# The value is already in the output list (so merge them)
for i in range(1, len(ent)):
# We convert to int and back to str
# This could be done elsewhere (or not at all...)
ent[i] = str(int(ent[i]) + int(val[i]))
break
else:
# The value wasn't in the output list (so add it)
# This is a useful feature of the for loop, the following
# is only executed if the break command wasn't reached above
output_list.append(val)
#print input_list
print output_list
上面的效率不如使用字典或导入可以在几行中执行相同操作的库,但它演示了该语言的一些功能。使用列表时要小心,上面修改了输入列表(尝试在结尾处取消注释输入列表的print语句)。
答案 1 :(得分:0)
我们说你有
In [45]: temp
Out[45]:
[['Mike', '5', '1', '6', '2'],
['Mike', '1', '1', '1', '1'],
['Mike', '3', '0', '3', '0'],
['jose', '0', '1', '2', '3'],
['jose', '2', '3', '4', '5'],
['lisa', '4', '4', '4', '4']]
然后,您可以使用Pandas ......
import pandas as pd
temp = pd.DataFrame(temp)
def test(m):
try: return int(m)
except: return m
temp = temp.applymap(test)
print temp.groupby(0).agg(sum)
如果要从cvs文件导入它,可以使用pd.read_csv
直接读取文件
答案 2 :(得分:0)
你可以按照建议使用更好的解决方案,但如果你想自己实施并学习,你可以关注,我会在评论中解释:
# utilities for iteration. groupby makes groups from a collection
from itertools import groupby
# implementation of common, simple operations such as
# multiplication, getting an item from a list
from operator import itemgetter
def my_sum(groups):
return [
ls[0] if i == 0 else str(sum(map(int, ls))) # keep first one since it's name, sum otherwise
for i, ls in enumerate(zip(*groups)) # transpose elements and give number to each
]
# list comprehension to make a list from another list
# group lists according to first element and apply our function on grouped elements
# groupby reveals group key and elements but key isn't needed so it's set to underscore
result = [my_sum(g) for _, g in groupby(ls, key=itemgetter(0))]
要理解此代码,您需要了解list comprehension
,*
运算符,(int
,enumerate
,map
,str
,zip
)内置插件和一些方便的模块,itertools
和operator
。
您已修改为添加标题会破坏我们的代码,因此我们需要将其删除,以便我们需要将ls[1:]
传递给groupby
而不是ls
。希望它有所帮助。
答案 3 :(得分:0)
这非常适合collections.Counter
from collections import Counter, defaultdict
csvdata = [['name','score1','score2','score3','score4'],
['Mike','5','1','6','2'],
['Mike','1','1','1','1'],
['Mike','3','0','3','0'],
['jose','0','1','2','3'],
['jose','2','3','4','5'],
['lisa','4','4','4','4']]
student_scores = defaultdict(Counter)
score_titles = csvdata[0][1:]
for row in csvdata[1:]:
student = row[0]
scores = dict(zip(score_titles, map(int, row[1:])))
student_scores[student] += Counter(scores)
print(student_scores["Mike"])
# >>> Counter({'score3':10, 'score1':9, 'score4':3, 'score2':2})
答案 4 :(得分:0)
作为一名初学者,我会考虑将您的数据转换为更简单的结构,如字典,这样您只需要列出一个列表列表。假设您删除了标题行,那么您可以将其转换为字典:
>>> data_dict = {}
>>> for row in data:
... data_dict.setdefault(row[0], []).append([int(i) for i in row[1:]])
>>> data_dict
{'Mike': [[5, 1, 6, 2], [1, 1, 1, 1], [3, 0, 3, 0]],
'jose': [[0, 1, 2, 3], [2, 3, 4, 5]],
'lisa': [[4, 4, 4, 4]]}
现在循环显示dict并总结列表应该相对容易(您可能希望查看sum
和zip
作为执行此操作的方法。