我有一个文件,其中包含许多学生的姓名,分数和考试编号(按此顺序),并想知道每个学生哪个考试成绩最好(从1分到5分,1分是最佳分数。)有些学生可能只参加过一次考试,有些参加考试。文件如下所示:
student1,4.2,1
student2,1.02,1
student3,4.1,1
student4,2.089,1
student2,3.02,2
student3,2.54,2
student4,3.69,2
student5,1.34,2
我计划创建一个包含姓名,考试编号和分数的字典,然后检索最佳分数。我的代码如下所示:
with open('filename.csv') as f:
lines = f.readlines()
scores = {} #{ Name : { Exam_Number : score }
for line in lines:
n = re.match(r"(.*)\,(.*)\,(.*)",line)
student = n.group(1)
score = n.group(2)
exam_number = n.group.(3)
scores[name] = { exam_number : score } #HERE IS THE PROBABLE ERROR
#Obtain the best score per student and the number of the exam
best_exam = {}
for name in scores:
for num in scores[name]:
for score in scores[name][num]:
if name in best_sco:
for num_ext in best_sco[name]:
if best_sco[name][num] > num_ext:
best_sco[sample] = { num : amb }
else:
best_sco[name] = {num : amb }
我意识到每当我尝试包含新的exam_number:已存在名称的得分组合时,将删除为该特定名称存储的先前对。例如,如果我拨打学生4的分数,则只会出现与考试2相对应的分数,因为这是最后一个被阅读而前一个被覆盖的分数。有没有办法用配对键声明一个字典然后迭代所有可能的对,考虑到一些键(但没有对)可能会重复?
编辑---------------------------
同样的问题以稍微不同的方式(它可能会为熟悉Python和Perl的人们敲响钟声)。 Python中有Perl's Multidimensional Hashes的等价物吗?
答案 0 :(得分:0)
我认为最好将csv文件存储为列表列表。然后使用itertools.groupby
按name
分组,过滤掉score
最高的行。这是源代码。
import csv
import collections
import itertools
import operator
# Read a csv file as a list of lists
with open('test.csv', 'r') as f: # name, scores and exam number
reader = csv.reader(f, delimiter=',')
lists = [[row[0], float(row[1]), int(row[2])] for row in reader]
# Obtain the best score per student and the number of the exam
for k, g in itertools.groupby(sorted(lists, key=operator.itemgetter(0)), key=operator.itemgetter(0)):
best_score = max(list(g), key=lambda x: x[1])
print(best_score)
# Output
'''
['student1', 4.2, 1]
['student2', 3.02, 2]
['student3', 4.1, 1]
['student4', 3.69, 2]
['student5', 1.34, 2]
'''
上一个回答:
通过实现perl的自动修复功能
来使用嵌套词典class AutoVivification(dict):
"""Implementation of perl's autovivification feature."""
def __getitem__(self, item):
try:
return dict.__getitem__(self, item)
except KeyError:
value = self[item] = type(self)()
return value
将行scores[name] = { exam_number : score }
替换为
d = AutoVivification()
d[name][exam_number] = score
答案 1 :(得分:0)
你可以使用元组作为dict中的键,只要没有重复的学生/考试组合
best_sco[('student-name', 'exam')] = 'score'
答案 2 :(得分:0)
if not scores[name]:
scores[name] = {exam_number, score}
else:
scores[name][exam_number] = score
best_exam = {}
for name, person_results in scores.iteritems():
best_exam_number = None
best_score = None
for exam_number, score in person_results.iteritems():
if score > best_score:
best_exam_number = exam_number
best_exam[name] = {best_exam_number, best_score}
答案 3 :(得分:0)
使用defaultdict`?将分数放在列表中然后检索最高分?并且你可以使用csv来读取文件本身,这将节省你必须正则表达式
from collections import defaultdict
from operator import itemgetter
import csv
with open('filename.csv') as f:
lines = f.readlines()
scores = defaultdict(list) #{ Name : { Exam_Number : score }
reader = csv.reader(f, delimiter=",")
for line in reader:
student, score, exam = line
scores[student].append({"exam": exam, "score": score}) # assuming they can only take an exam once
for student, exams in scores.items():
best = sorted(exams, key=itemgetter("score"), reverse=True)[0]
print student, best