Question

我有一个MultIndex数据框，我正在尝试填充值MAX_PTS_YR，以使MAX_PTS_YR年t+1的值等于MAX_PTS_YR t }}

所以：MAX_PTS_YR中的2016应该等于116。

使用nth，我找到了前一年的MAX_PTS：

DF['MAX_PTS_YR'] = DF.groupby(by=['Affiliation','Year'],as_index=False)['PtsYr'].nth(-1)


Affiliation mkid        Year    PtsYr  MAX_PTS_YR
MVPAFL0003  10176228    2015    96.0    NaN
MVPAFL0003  10176228    2015    96.0    NaN
MVPAFL0003  10176228    2015    106.0   NaN
MVPAFL0003  10176228    2015    116.0   116.0
MVPAFL0003  10176228    2016    10.0    NaN
MVPAFL0003  10176228    2016    10.0    NaN
MVPAFL0003  10176228    2016    20.0    NaN
MVPAFL0003  10176228    2016    20.0    NaN
MVPAFL0003  10176228    2016    30.0    NaN
MVPAFL0003  10176228    2016    40.0    NaN
MVPAFL0003  10176228    2016    50.0    NaN
MVPAFL0003  10176228    2016    50.0    NaN
MVPAFL0003  10176228    2016    52.0    NaN
MVPAFL0003  10176228    2016    62.0    NaN
MVPAFL0003  10176228    2016    62.0    NaN
MVPAFL0003  10176228    2016    82.0    NaN
MVPAFL0003  10176228    2016    94.0    NaN
MVPAFL0003  10176228    2016    94.0    NaN
MVPAFL0003  10176228    2016    94.0    NaN
MVPAFL0003  10176228    2016    104.0   NaN
MVPAFL0003  10176228    2016    114.0   114.0

我认为我可以fillna关注Affiliation群组：

DF.groupby(by=['Affiliation'],as_index=False)['MAX_PTS_AFFIL'].fillna(method='ffill',inplace=True)

但是当我这样做时，没有填写NaN个值。

任何想法？

Answer 1

# get just the series you are filling to simplify things
s1 = df.set_index(['Affiliation', 'Year']).MAX_PTS_YR

# groupby to get the max per group
mx = s1.groupby(level=[0, 1]).max()

# shift your year index by one year
mx.index.set_levels(mx.index.levels[1] + 1, 1, inplace=True)

# fill in missing bits
s1.fillna(mx)

Affiliation  Year
MVPAFL0003   2015      NaN
             2015      NaN
             2015      NaN
             2015    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    116.0
             2016    114.0
Name: MAX_PTS_YR, dtype: float64

现在分配给df

df.MAX_PTS_YR = (s1.fillna(mx).values)
df

Answer 2

如果这是唯一具有空数据的列，则可以对整个数据帧执行操作：

DF.ffill(inplace=True)

请注意，即使您最初以整数形式输入点数，也会返回浮点数。这是因为NaN在技术上是浮动的，它们强制整个列的类型。要获得整数（这可能是您想要的，除非您可以获得部分积分），请执行以下操作：

DF['MAX_PTS_YR'].astype('int64', inplace=True)

也许你也想对PTS专栏这样做。

熊猫|分组数据框中的Fillna（ffill）未填充

2 个答案: