A B C
0 2002-01-16 2002-02-28 HH
1 2002-01-16 2002-01-30 DX
2 2002-01-16 2002-02-28 TY
3 2002-01-16 2002-01-30 FY
4 2002-04-28 2002-04-30 PE
5 2002-04-28 2002-05-25 CO
6 2002-04-28 2002-04-30 OL
7 2002-04-28 2002-05-25 DS
我想为每个A
组选择A
日期为的行
B
日期。B
日期。输出应为:
A B C
1 2002-01-16 2002-01-30 DX
3 2002-01-16 2002-01-30 FY
5 2002-04-28 2002-05-25 CO
7 2002-04-28 2002-05-25 DS
我试过了:
df['Diff'] = (df['B'] - df['A']).abs()
df.loc[df['Diff'] == df['A'].map(df.groupby('A')['Diff'].min())]
答案 0 :(得分:3)
试试吧:
df[df['Diff'] == df['A'].map(df[df.Diff > pd.Timedelta('2 days')]
.groupby('A')['Diff'].min())]
输出:
A B C Diff
1 2002-01-16 2002-01-30 DX 14 days
3 2002-01-16 2002-01-30 FY 14 days
5 2002-04-28 2002-05-25 CO 27 days
7 2002-04-28 2002-05-25 DS 27 days
答案 1 :(得分:3)
我将分两步完成
s=df[((df.B-df.A).dt.days>2)]# 1st condition
s[((s.B-s.A).dt.days.groupby(s.A).transform(lambda x : x==x.min()))]# 2nd condition
Out[1396]:
A B C
1 2002-01-16 2002-01-30 DX
3 2002-01-16 2002-01-30 FY
5 2002-04-28 2002-05-25 CO
7 2002-04-28 2002-05-25 DS