对评分数据进行分组,计算和排序

时间:2016-03-04 07:45:50

标签: python python-2.7 pandas

我有一个列表列表,每个列表都有"行ID","团队名称","团队编号"," scout&#34 ;,"得分":

teams = [[23L, u'team1', 5713L, u'Gange', 144L], 
 [22L, u'team3', 1406L, u'Gange', 126L], 
 [15L, u'team2', 7319L, u'Bob Loblaw', 90L], 
 [17L, u'team2', 7319L, u'Gange', 54L], 
 [18L, u'team1', 5713L, u'Bob Loblaw', 69L], 
 [16L, u'team3', 1406L, u'Bob Loblaw', 113L]]

我想首先按照"团队编号"对数据进行分组。值,然后得到"得分的最小值/平均值/最大值。团队价值。我可以使用这些函数单独使用pandas获取所有这些信息:

res = pd.DataFrame(teams)
res.columns = ['id', 'name', 'number', 'scout', 'score']
print res.groupby('number')['score'].min()
print res.groupby('number')['score'].mean()
print res.groupby('number')['score'].max()

number
406      0
5703     9
7129    18
Name: score, dtype: int64

number
406      9.0
5703    22.5
7129    27.0
Name: score, dtype: float64

number
406     18
5703    36
7129    36
Name: score, dtype: int64

我的问题是我想保留除分数之外的所有原始列,有效地将行折叠为每个团队的单行,并将分数列替换为具有来自行的min,avg,max值的列表/元组同一个团队,但要将其输出到python对象,我可以传递给一个表单,我不确定pandas是否是最好的模块。

我看过一些带有itertools,pandas,numpy等的样本,但我现在不知道如何解决这个问题。提前感谢任何建议。

1 个答案:

答案 0 :(得分:1)

Python附带电池。您可以使用SQLite模块中sqlite3的强大功能。

import sqlite3

teams = [[23L, u'team1', 5713L, u'Gange', 144L],
 [22L, u'team3', 1406L, u'Gange', 126L],
 [15L, u'team2', 7319L, u'Bob Loblaw', 90L],
 [17L, u'team2', 7319L, u'Gange', 54L],
 [18L, u'team1', 5713L, u'Bob Loblaw', 69L],
 [16L, u'team3', 1406L, u'Bob Loblaw', 113L]]

con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("create table t (id int, team_name text, team_number int, scout text, team_score int)");
cur.executemany("insert into t values(?, ?, ?, ?, ?)", teams)
con.commit()

res = cur.execute("""
  SELECT team_number, min(team_score), max(team_score), avg(team_score)
    FROM t
GROUP BY team_number""")

print "team_number, min, max, avg"
for row in res:
    print row

输出:

team_number, min, max, avg
(1406, 113, 126, 119.5)
(5713, 69, 144, 106.5)
(7319, 54, 90, 72.0)