按日期分组记录最后一条记录pandas python

时间:2013-12-27 02:04:13

标签: python pandas

我想在组中标记最后一条记录 [ 'RU', '情况下', 'opdate'] 基于'lst_svc'

这是我的尝试(有问题,因为副本被计算两次)。

import pandas as pd
from datetime import datetime

# Create Dateframe
d = {'ru'     : pd.Series([1., 1., 1., 1., 3, 3]),
     'case'   : pd.Series([2., 2., 2., 2., 2, 2]),
     'opdate' : pd.Series([datetime(2012, 5, 2), datetime(2012, 5, 2), datetime(2012, 5, 2),datetime(2012, 5, 2), datetime(2012, 5, 3),datetime(2012, 5, 3)]),
     'lst_svc': pd.Series([datetime(2012, 5, 2), datetime(2012, 5, 3), datetime(2012, 5, 5),datetime(2012, 5, 5),datetime(2012, 6, 5),])}

df = pd.DataFrame(d)

# Mark last
df['lastMark'] = (df.groupby(['ru','case','opdate'])['lst_svc'].transform(max) == df['lst_svc']).astype(int)

DataFrame看起来像这样:

   case    lst_svc     opdate   ru
0   2.0 2012-05-02 2012-05-02  1.0
1   2.0 2012-05-03 2012-05-02  1.0
2   2.0 2012-05-05 2012-05-02  1.0
3   2.0 2012-05-05 2012-05-02  1.0
4   2.0 2012-06-05 2012-05-03  3.0
5   2.0        NaT 2012-05-03  3.0

我的代码的(错误)结果如下所示:

   case    lst_svc     opdate   ru  lastMark
0   2.0 2012-05-02 2012-05-02  1.0         0
1   2.0 2012-05-03 2012-05-02  1.0         0
2   2.0 2012-05-05 2012-05-02  1.0         1
3   2.0 2012-05-05 2012-05-02  1.0         1
4   2.0 2012-06-05 2012-05-03  3.0         1
5   2.0        NaT 2012-05-03  3.0         0

1 个答案:

答案 0 :(得分:4)

怎么样:

import pandas as pd
from datetime import datetime

# Create example DateFrame
d = {'ru'     : pd.Series([1., 1., 1., 1., 3, 3]),
     'case'   : pd.Series([2., 2., 2., 2., 2, 2]),
     'opdate' : pd.Series([datetime(2012, 5, 2), datetime(2012, 5, 2), datetime(2012, 5, 2),datetime(2012, 5, 2), datetime(2012, 5, 3),datetime(2012, 5, 3)]),
     'lst_svc': pd.Series([datetime(2012, 5, 2), datetime(2012, 5, 3), datetime(2012, 5, 5),datetime(2012, 5, 5),datetime(2012, 6, 5),])}

df = pd.DataFrame(d)

# Mark last      
def f(s):
    s2 = pd.Series(0, index=s.index)
    s2.iloc[-1] = 1
    return s2

df["lastMark"] = df.groupby(['ru','case','opdate'])['lst_svc'].apply(f)

输出如下:

   case    lst_svc     opdate   ru  lastMark
0   2.0 2012-05-02 2012-05-02  1.0         0
1   2.0 2012-05-03 2012-05-02  1.0         0
2   2.0 2012-05-05 2012-05-02  1.0         0
3   2.0 2012-05-05 2012-05-02  1.0         1
4   2.0 2012-06-05 2012-05-03  3.0         0
5   2.0        NaT 2012-05-03  3.0         1