我有一个多索引数据框,我正在尝试计算连续的winners
问题是列值中散布着一些'NaN'值,在尝试计算连续的winners
week_1 week_2 week_3 week_4 week_5 week_6 \
Year
2000 Arizona Cardinals loser winner loser loser winner loser
Atlanta Falcons winner loser winner loser loser loser
Baltimore Ravens winner NaN winner winner winner winner
Buffalo Bills NaN winner loser loser loser winner
Carolina Panthers loser winner loser loser winner loser
我可以使用df3 = df.shift(-1, axis =1).isin(['winner'])
进行比较,但这不会跳过NaN
值。
是这样的:
Baltimore Ravens winner NaN winner
应计为连续值的将被跳过。
答案 0 :(得分:1)
我试图找出向量化解决方案,但没有解决。
这可以通过在每行上进行简单的python循环轻松解决:
def find_wins(x):
mw = 0
c = 0
for e in x.dropna():
c = c + 1 if e == 'winner' else 0
mw = max(mw, c)
return mw
res = df.apply(find_wins, axis=1)
使用您的原始数据帧df
,它将返回以下res
Series
:
year
2000 Arizona Cardinals 1
Atlanta Falcons 1
Baltimore Ravens 5
Buffalo Bills 1
Carolina Panthers 1
dtype: int64
其中每个元素是连续获胜的最大次数(跳过了Nan)。
这里的意思是,在循环每一行之前,先使用x.dropna()
删除nan
值,然后计算连续的'winner'
。
答案 1 :(得分:1)
要删除NaN
值和移位值,可以沿轴1和apply
使用dropna
。您必须进行一些修改才能更改值:
no_bye = df.apply(lambda x: x.dropna().reset_index(drop=True), axis=1)
no_bye.columns = ['game_' + str(n+1) for n in range(16)]