如何在python中读取和排序csv文件?

时间:2015-10-07 17:07:23

标签: python sorting csv

我是python的新手,并且有一个名字和分数的csv文件,如下所示:

public class ComboBoxItemWrapper<T>
{
    public T Value { get; set; }
    public string Text { get; set; }
}

我需要知道如何阅读这个文件,文件必须显示两个名称相同的条目,例如Andrew,10和Andrew,11作为Andrew,10,11。我还需要能够按名称,最高分或平均分数进行排序。如果可能的话,我也希望仅为每个名称使用最后3个条目。 这是我试图用来按名称阅读和排序的代码:

#example strings:
nextline1 = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4 };"
nextline2 = "DD:MM:YYYY INFO - 'WeeklyMedal: Hole = 1; Par = 4; Index = 2; Distance = 459; Score = { Player1 = 4; Player2 = 6; Player3 = 4 };"

import re
lineRegexp = re.compile(r'.+\'WeeklyMedal:(.+)\'?') #this regexp returns WeeklyMedal record.
weeklyMedalRegexp = re.compile(r'(\w+) = (\{.+\}|\w+)') #this regexp parses WeeklyMedal

#helper recursive function to process WeeklyMedal record. returns dictionary
parseWeeklyMedal = lambda r, info: { k: (int(v) if v.isdigit() else parseWeeklyMedal(r, v)) for (k, v) in r.findall(info)}
parsedLines = []
for line in [nextline1, nextline2]:
    info = lineRegexp.search(line)
    if info:
        #process WeeklyMedal record
        parsedLines.append(parseWeeklyMedal(weeklyMedalRegexp, info.group(0)))
        #or do something with parsed dictionary in place

# do something here with entire result, print for example
print(parsedLines)

2 个答案:

答案 0 :(得分:0)

熊猫非常好看

import pandas as pd

df = pd.read_csv("<pathToFileIN>",index_col=None,header=None)
df.columns = ["name","x"]
n = df.groupby("name").apply(lambda x: ",".join([str(_) for _ in x["x"].values[-3:]])).values
df.drop_duplicates(subset="name",inplace=True)
df["x"] = n
df.sort("name",inplace=True)

df.to_csv("<pathToFileOUT>",index=None,sep=";")

答案 1 :(得分:0)

要合并分数,请使用collections.defaultdict

scores_by_name = collections.defaultdict(list)
for row in Reader:
    name = row[0]
    score = int(row[1])
    scores_by_name[name].append(score)

要保留最后三个分数,请选择3个项目:

scores_by_name = {name: scores[-3:] for name, score in scores_by_name.items()}

按字母顺序迭代:

for name, scores in sorted(scores_by_name.items()):
    ... # whatever

按最高分数进行迭代:

for name, scores in sorted(scores_by_name.items(), key=(lambda item: max(item[1]))):
    ...