我想在组中标记最后一条记录 [ 'RU', '情况下', 'opdate'] 基于'lst_svc'
这是我的尝试(有问题,因为副本被计算两次)。
import pandas as pd
from datetime import datetime
# Create Dateframe
d = {'ru' : pd.Series([1., 1., 1., 1., 3, 3]),
'case' : pd.Series([2., 2., 2., 2., 2, 2]),
'opdate' : pd.Series([datetime(2012, 5, 2), datetime(2012, 5, 2), datetime(2012, 5, 2),datetime(2012, 5, 2), datetime(2012, 5, 3),datetime(2012, 5, 3)]),
'lst_svc': pd.Series([datetime(2012, 5, 2), datetime(2012, 5, 3), datetime(2012, 5, 5),datetime(2012, 5, 5),datetime(2012, 6, 5),])}
df = pd.DataFrame(d)
# Mark last
df['lastMark'] = (df.groupby(['ru','case','opdate'])['lst_svc'].transform(max) == df['lst_svc']).astype(int)
DataFrame看起来像这样:
case lst_svc opdate ru
0 2.0 2012-05-02 2012-05-02 1.0
1 2.0 2012-05-03 2012-05-02 1.0
2 2.0 2012-05-05 2012-05-02 1.0
3 2.0 2012-05-05 2012-05-02 1.0
4 2.0 2012-06-05 2012-05-03 3.0
5 2.0 NaT 2012-05-03 3.0
我的代码的(错误)结果如下所示:
case lst_svc opdate ru lastMark
0 2.0 2012-05-02 2012-05-02 1.0 0
1 2.0 2012-05-03 2012-05-02 1.0 0
2 2.0 2012-05-05 2012-05-02 1.0 1
3 2.0 2012-05-05 2012-05-02 1.0 1
4 2.0 2012-06-05 2012-05-03 3.0 1
5 2.0 NaT 2012-05-03 3.0 0
答案 0 :(得分:4)
怎么样:
import pandas as pd
from datetime import datetime
# Create example DateFrame
d = {'ru' : pd.Series([1., 1., 1., 1., 3, 3]),
'case' : pd.Series([2., 2., 2., 2., 2, 2]),
'opdate' : pd.Series([datetime(2012, 5, 2), datetime(2012, 5, 2), datetime(2012, 5, 2),datetime(2012, 5, 2), datetime(2012, 5, 3),datetime(2012, 5, 3)]),
'lst_svc': pd.Series([datetime(2012, 5, 2), datetime(2012, 5, 3), datetime(2012, 5, 5),datetime(2012, 5, 5),datetime(2012, 6, 5),])}
df = pd.DataFrame(d)
# Mark last
def f(s):
s2 = pd.Series(0, index=s.index)
s2.iloc[-1] = 1
return s2
df["lastMark"] = df.groupby(['ru','case','opdate'])['lst_svc'].apply(f)
输出如下:
case lst_svc opdate ru lastMark
0 2.0 2012-05-02 2012-05-02 1.0 0
1 2.0 2012-05-03 2012-05-02 1.0 0
2 2.0 2012-05-05 2012-05-02 1.0 0
3 2.0 2012-05-05 2012-05-02 1.0 1
4 2.0 2012-06-05 2012-05-03 3.0 0
5 2.0 NaT 2012-05-03 3.0 1