我有一个数据框和一个带字典的for循环,用于定义如何处理上一个问题中的特定列名:Pandas Generating dataframe based on columns being present
import pandas as pd
df=pd.DataFrame({'Players': [ 'Sam', 'Greg', 'Steve', 'Sam',
'Greg', 'Steve', 'Greg', 'Steve', 'Greg', 'Steve'],
'Wins': [10,5,5,20,30,20,6,9,3,10],
'Losses': [5,5,5,2,3,2,16,20,3,12],
'Type': ['A','B','B','B','A','B','B','A','A','B'],
})
p=df.groupby('Players')
sumdict = {'Total Games': (None, 'count'),
'Average Wins': ('Wins', 'mean'),
'Greatest Wins': ('Wins', 'max'),
'Unique games': ('Type', 'nunique'),
'Max Score': ('Score', 'max')}
summary = []
for key, (column, op) in sumdict.items():
if column is None:
res = p.agg(op).max(axis=1)
elif column not in df:
continue
else:
res = p[column].agg(lambda x: getattr(x, op)())
summary.append(pd.DataFrame({key: res}))
summary = pd.concat(summary, axis=1)
除了计算列内特定情况的apply
函数外,几乎所有情况下的代码都适用:
streak = pd.DataFrame({'Streak':p.Wins.apply(lambda x: (x > 5).sum())})
有没有办法将apply函数合并到字典sumdict
?
答案 0 :(得分:0)
你有几个选择。
IMO 2.有点清洁(虽然可能鲜为人知?),你可以g.agg("max")
作为g.max()
的别名。
sumdict["Streak"] = "Wins", lambda x: (x > 5).sum()
并执行以下操作,注释行是唯一的更改:
summary = []
for key, (column, op) in sumdict.items():
if column is None:
res = p.agg(op).max(axis=1)
elif column not in df:
continue
else:
res = p[column].agg(op) # just use the string (or it could be a func)
summary.append(pd.DataFrame({key: res}))
summary = pd.concat(summary, axis=1)
然后Streak工作得非常完美:
In [23]: summary
Out[23]:
Greatest Wins Total Games Streak Average Wins Unique games
Players
Greg 30 4 2 11 2
Sam 20 2 2 15 2
Steve 20 4 3 11 2