如何填写Pandas中每组的最后一行?

时间:2014-01-22 13:43:21

标签: python pandas

我有一个数据框df,每个组的最后一行(groupby STK_ID)是NaN:

>>> print df
                   sales  opr_pft  net_pft
STK_ID RPT_Date                           
002138 20130331   2.0703   0.3373   0.2829
       20130630      NaN      NaN      NaN
       20130930   7.4993   1.2248   1.1630
       20140122      NaN      NaN      NaN
600004 20130331  11.8429   3.0816   2.1637
       20130630  24.6232   6.2152   4.5135
       20130930  37.9673   9.2088   6.6463
       20140122      NaN      NaN      NaN
600809 20130331  27.9517   9.9426   7.5182
       20130630  40.6460  13.9414   9.8572
       20130930  53.0501  16.8081  11.8605
       20140122      NaN      NaN      NaN

现在我希望fillna是每个组的最后一行及其前一行,结果应该是这样的:

                   sales  opr_pft  net_pft
STK_ID RPT_Date                           
002138 20130331   2.0703   0.3373   0.2829
       20130630      NaN      NaN      NaN    **(Not fillna this row)**
       20130930   7.4993   1.2248   1.1630
       20140122   7.4993   1.2248   1.1630
600004 20130331  11.8429   3.0816   2.1637
       20130630  24.6232   6.2152   4.5135
       20130930  37.9673   9.2088   6.6463
       20140122  37.9673   9.2088   6.6463
600809 20130331  27.9517   9.9426   7.5182
       20130630  40.6460  13.9414   9.8572
       20130930  53.0501  16.8081  11.8605
       20140122  53.0501  16.8081  11.8605

我几乎完成了它:df.groupby(level=0).apply(lambda grp: grp.fillna(method='ffill')),它生成如下:

                   sales  opr_pft  net_pft
STK_ID RPT_Date                           
002138 20130331   2.0703   0.3373   0.2829
       20130630   2.0703   0.3373   0.2829
       20130930   7.4993   1.2248   1.1630
       20140122   7.4993   1.2248   1.1630
600004 20130331  11.8429   3.0816   2.1637
       20130630  24.6232   6.2152   4.5135
       20130930  37.9673   9.2088   6.6463
       20140122  37.9673   9.2088   6.6463
600809 20130331  27.9517   9.9426   7.5182
       20130630  40.6460  13.9414   9.8572
       20130930  53.0501  16.8081  11.8605
       20140122  53.0501  16.8081  11.8605

这不是我想要的,它通过组内的行填充。那么如何填写Pandas中每组的最后一行?

1 个答案:

答案 0 :(得分:5)

您可以在groupby中使用其他功能:

def f(g):
    last = len(g.values)-1
    g.iloc[last,:] = g.iloc[last-1,:]
    return g
print df.groupby(level=0).apply(f)

输出:

                   sales  opr_pft  net_pft
STK_ID RPT_Date                           
2138   20130331   2.0703   0.3373   0.2829
       20130630      NaN      NaN      NaN
       20130930   7.4993   1.2248   1.1630
       20140122   7.4993   1.2248   1.1630
600004 20130331  11.8429   3.0816   2.1637
       20130630  24.6232   6.2152   4.5135
       20130930  37.9673   9.2088   6.6463
       20140122  37.9673   9.2088   6.6463
600809 20130331  27.9517   9.9426   7.5182
       20130630  40.6460  13.9414   9.8572
       20130930  53.0501  16.8081  11.8605
       20140122  53.0501  16.8081  11.8605