选择每组的行数和时间条件

时间:2018-04-05 20:03:08

标签: python pandas

           A          B   C
0 2002-01-16 2002-02-28  HH
1 2002-01-16 2002-01-30  DX
2 2002-01-16 2002-02-28  TY
3 2002-01-16 2002-01-30  FY
4 2002-04-28 2002-04-30  PE
5 2002-04-28 2002-05-25  CO
6 2002-04-28 2002-04-30  OL
7 2002-04-28 2002-05-25  DS

我想为每个A组选择A日期为的行

  • 最接近B日期。
  • 大约两天到B日期。

输出应为:

           A          B   C
1 2002-01-16 2002-01-30  DX
3 2002-01-16 2002-01-30  FY
5 2002-04-28 2002-05-25  CO
7 2002-04-28 2002-05-25  DS

我试过了:

df['Diff'] = (df['B'] - df['A']).abs()
df.loc[df['Diff'] == df['A'].map(df.groupby('A')['Diff'].min())]

2 个答案:

答案 0 :(得分:3)

试试吧:

df[df['Diff'] == df['A'].map(df[df.Diff > pd.Timedelta('2 days')]
                              .groupby('A')['Diff'].min())]

输出:

           A          B   C    Diff
1 2002-01-16 2002-01-30  DX 14 days
3 2002-01-16 2002-01-30  FY 14 days
5 2002-04-28 2002-05-25  CO 27 days
7 2002-04-28 2002-05-25  DS 27 days

答案 1 :(得分:3)

我将分两步完成

s=df[((df.B-df.A).dt.days>2)]# 1st condition 
s[((s.B-s.A).dt.days.groupby(s.A).transform(lambda x : x==x.min()))]# 2nd condition 
Out[1396]: 
           A          B   C
1 2002-01-16 2002-01-30  DX
3 2002-01-16 2002-01-30  FY
5 2002-04-28 2002-05-25  CO
7 2002-04-28 2002-05-25  DS