我有一个数据框,最初包含两个列,Home,如果游戏是家中的玩家,则为1,否则为0,PTS,记录玩家在给定游戏中得分的点数。我想最终得到第三列,一个滚动指标,表示玩家在家里玩的敏感程度。我将按如下方式计算:
家庭敏感度=(平均PTS家庭 - 平均PTS离开)/平均PTS
我在下面的代码中成功地做到了这一点,但感觉很麻烦,因为我创建了许多我最不需要的列。如何更直接地解决这个问题?
df=pd.DataFrame({'Home':[1,0,1,0,1,0,1,0], 'PTS':[11, 10, 12, 11, 13, 12, 14, 12]})
df.loc[testDF['Home'] == 1, 'Home PTS'] = df['PTS']
df.loc[testDF['Home'] == 0, 'Away PTS'] = df['PTS']
df['Home PTS'] = df['Home PTS'].fillna(0)
df['Away PTS'] = df['Away PTS'].fillna(0)
df['Home Sum'] = df['Home PTS'].expanding(min_periods=1).sum()
df['Away Sum'] = df['Away PTS'].expanding(min_periods=1).sum()
df['Home Count']=df['Home'].expanding().sum()
df['Index']=df.index+1
df['Away Count']=df['Index']-df['Home Count']
df['Home Average']=df['Home Sum']/df['Home Count']
df['Away Average']=df['Away Sum']/df['Away Count']
df['Average']=df['PTS'].expanding().mean()
df['Metric']=(df['Home Average']-df['Away Average'])/df['Average']
答案 0 :(得分:0)
这是一种天真的方法:在循环中占用越来越大的DataFrame片段;在每个切片上进行数学运算并将其存储在列表中;将列表分配给DataFrame的新列(使用testDF):
df = tesdDF
sens = []
for i in range(len(df)):
d = df[:i]
mean_pts = d.PTS.mean()
home = d[d.Home == 1].PTS.mean()
away = d[d.Home == 0].PTS.mean()
#print(home, away, (home - away) / mean_pts)
sens.append((home - away) / mean_pts)
df['sens'] = sens
>>> df
Home PTS sens
0 1 11 NaN
1 0 10 NaN
2 1 12 0.095238
3 0 11 0.136364
4 1 13 0.090909
5 0 12 0.131579
6 1 14 0.086957
7 0 12 0.126506
使用DataFrame.expanding():还没有......
>>> mean_pts = df.PTS.expanding(1).mean()
>>> away = df[df['Home'] == 0].PTS.expanding(1).mean()
>>> home = df[df['Home'] == 1].PTS.expanding(1).mean()
>>>
>>> home
0 11.0
2 11.5
4 12.0
6 12.5
Name: PTS, dtype: float64
>>> away
1 10.00
3 10.50
5 11.00
7 11.25
Name: PTS, dtype: float64
>>> mean_pts
0 11.000000
1 10.500000
2 11.000000
3 11.000000
4 11.400000
5 11.500000
6 11.857143
7 11.875000
Name: PTS, dtype: float64
>>>
要进行数学运算需要更多的操作
您无法直接获得home
和away
之间的差异,因为索引不同 - 但您可以这样做...
>>> home.values - away.values
array([ 1. , 1. , 1. , 1.25])
>>>
此外home
和away
只有四行,mean_pts
有八行。
我使用以下功能尝试了.expanding(1).apply()
并且没有达到我的预期,expanding
没有将两列都传递给该函数,它似乎传递了一列然后其他;所以我揍了......
def f(thing):
print(thing, '***')
return thing.mean()
>>> df.expanding(1).apply(f)
[ 1.] ***
[ 1. 0.] ***
[ 1. 0. 1.] ***
[ 1. 0. 1. 0.] ***
[ 1. 0. 1. 0. 1.] ***
[ 1. 0. 1. 0. 1. 0.] ***
[ 1. 0. 1. 0. 1. 0. 1.] ***
[ 1. 0. 1. 0. 1. 0. 1. 0.] ***
[ 11.] ***
[ 11. 10.] ***
[ 11. 10. 12.] ***
[ 11. 10. 12. 11.] ***
[ 11. 10. 12. 11. 13.] ***
[ 11. 10. 12. 11. 13. 12.] ***
[ 11. 10. 12. 11. 13. 12. 14.] ***
[ 11. 10. 12. 11. 13. 12. 14. 12.] ***