我有一个列表列表,每个列表都有"行ID","团队名称","团队编号"," scout&#34 ;,"得分":
teams = [[23L, u'team1', 5713L, u'Gange', 144L],
[22L, u'team3', 1406L, u'Gange', 126L],
[15L, u'team2', 7319L, u'Bob Loblaw', 90L],
[17L, u'team2', 7319L, u'Gange', 54L],
[18L, u'team1', 5713L, u'Bob Loblaw', 69L],
[16L, u'team3', 1406L, u'Bob Loblaw', 113L]]
我想首先按照"团队编号"对数据进行分组。值,然后得到"得分的最小值/平均值/最大值。团队价值。我可以使用这些函数单独使用pandas获取所有这些信息:
res = pd.DataFrame(teams)
res.columns = ['id', 'name', 'number', 'scout', 'score']
print res.groupby('number')['score'].min()
print res.groupby('number')['score'].mean()
print res.groupby('number')['score'].max()
number
406 0
5703 9
7129 18
Name: score, dtype: int64
number
406 9.0
5703 22.5
7129 27.0
Name: score, dtype: float64
number
406 18
5703 36
7129 36
Name: score, dtype: int64
我的问题是我想保留除分数之外的所有原始列,有效地将行折叠为每个团队的单行,并将分数列替换为具有来自行的min,avg,max值的列表/元组同一个团队,但要将其输出到python对象,我可以传递给一个表单,我不确定pandas是否是最好的模块。
我看过一些带有itertools,pandas,numpy等的样本,但我现在不知道如何解决这个问题。提前感谢任何建议。
答案 0 :(得分:1)
Python附带电池。您可以使用SQLite模块中sqlite3
的强大功能。
import sqlite3
teams = [[23L, u'team1', 5713L, u'Gange', 144L],
[22L, u'team3', 1406L, u'Gange', 126L],
[15L, u'team2', 7319L, u'Bob Loblaw', 90L],
[17L, u'team2', 7319L, u'Gange', 54L],
[18L, u'team1', 5713L, u'Bob Loblaw', 69L],
[16L, u'team3', 1406L, u'Bob Loblaw', 113L]]
con = sqlite3.connect(":memory:")
cur = con.cursor()
cur.execute("create table t (id int, team_name text, team_number int, scout text, team_score int)");
cur.executemany("insert into t values(?, ?, ?, ?, ?)", teams)
con.commit()
res = cur.execute("""
SELECT team_number, min(team_score), max(team_score), avg(team_score)
FROM t
GROUP BY team_number""")
print "team_number, min, max, avg"
for row in res:
print row
输出:
team_number, min, max, avg
(1406, 113, 126, 119.5)
(5713, 69, 144, 106.5)
(7319, 54, 90, 72.0)