我有一个MultIndex数据框,我正在尝试填充值MAX_PTS_YR
,以使MAX_PTS_YR
年t+1
的值等于MAX_PTS_YR
t
}}
所以:MAX_PTS_YR
中的2016
应该等于116
。
使用nth
,我找到了前一年的MAX_PTS
:
DF['MAX_PTS_YR'] = DF.groupby(by=['Affiliation','Year'],as_index=False)['PtsYr'].nth(-1)
Affiliation mkid Year PtsYr MAX_PTS_YR
MVPAFL0003 10176228 2015 96.0 NaN
MVPAFL0003 10176228 2015 96.0 NaN
MVPAFL0003 10176228 2015 106.0 NaN
MVPAFL0003 10176228 2015 116.0 116.0
MVPAFL0003 10176228 2016 10.0 NaN
MVPAFL0003 10176228 2016 10.0 NaN
MVPAFL0003 10176228 2016 20.0 NaN
MVPAFL0003 10176228 2016 20.0 NaN
MVPAFL0003 10176228 2016 30.0 NaN
MVPAFL0003 10176228 2016 40.0 NaN
MVPAFL0003 10176228 2016 50.0 NaN
MVPAFL0003 10176228 2016 50.0 NaN
MVPAFL0003 10176228 2016 52.0 NaN
MVPAFL0003 10176228 2016 62.0 NaN
MVPAFL0003 10176228 2016 62.0 NaN
MVPAFL0003 10176228 2016 82.0 NaN
MVPAFL0003 10176228 2016 94.0 NaN
MVPAFL0003 10176228 2016 94.0 NaN
MVPAFL0003 10176228 2016 94.0 NaN
MVPAFL0003 10176228 2016 104.0 NaN
MVPAFL0003 10176228 2016 114.0 114.0
我认为我可以fillna
关注Affiliation
群组:
DF.groupby(by=['Affiliation'],as_index=False)['MAX_PTS_AFFIL'].fillna(method='ffill',inplace=True)
但是当我这样做时,没有填写NaN
个值。
任何想法?
答案 0 :(得分:1)
# get just the series you are filling to simplify things
s1 = df.set_index(['Affiliation', 'Year']).MAX_PTS_YR
# groupby to get the max per group
mx = s1.groupby(level=[0, 1]).max()
# shift your year index by one year
mx.index.set_levels(mx.index.levels[1] + 1, 1, inplace=True)
# fill in missing bits
s1.fillna(mx)
Affiliation Year
MVPAFL0003 2015 NaN
2015 NaN
2015 NaN
2015 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 116.0
2016 114.0
Name: MAX_PTS_YR, dtype: float64
现在分配给df
df.MAX_PTS_YR = (s1.fillna(mx).values)
df
答案 1 :(得分:1)
如果这是唯一具有空数据的列,则可以对整个数据帧执行操作:
DF.ffill(inplace=True)
请注意,即使您最初以整数形式输入点数,也会返回浮点数。这是因为NaN在技术上是浮动的,它们强制整个列的类型。要获得整数(这可能是您想要的,除非您可以获得部分积分),请执行以下操作:
DF['MAX_PTS_YR'].astype('int64', inplace=True)
也许你也想对PTS专栏这样做。