A B C
0 2002-01-16 2002-02-28 Jack
1 2002-01-16 2002-01-30 Helen
2 2002-01-16 2002-02-28 Peter
3 2002-01-16 2002-01-30 Jud
4 2002-04-27 2002-04-30 Nick
5 2002-04-27 2002-05-25 Wendy
6 2002-04-27 2002-04-30 Bryan
7 2002-04-27 2002-05-25 Sarah
我想根据每个A
群体选择B
日期与A
日期相近的行。
输出应为:
A B C
1 2002-01-16 2002-01-30 Helen
3 2002-01-16 2002-01-30 Jud
4 2002-04-27 2002-04-30 Nick
6 2002-04-27 2002-04-30 Bryan
答案 0 :(得分:3)
使用:
df = df[df['B'].sub(df['A']).groupby(df['A']).transform(lambda x: x == x.min())]
print (df)
A B C
1 2002-01-16 2002-01-30 Helen
3 2002-01-16 2002-01-30 Jud
4 2002-04-27 2002-04-30 Nick
6 2002-04-27 2002-04-30 Bryan
<强>详情:
print (df['B'].sub(df['A']))
0 43 days
1 14 days
2 43 days
3 14 days
4 3 days
5 28 days
6 3 days
7 28 days
dtype: timedelta64[ns]
print (df['B'].sub(df['A']).groupby(df['A']).transform(lambda x: x == x.min()))
0 False
1 True
2 False
3 True
4 True
5 False
6 True
7 False
dtype: bool
答案 1 :(得分:1)
这是一种方式。
# convert columns to datetime
df[['A', 'B']] = df[['A', 'B']].apply(pd.to_datetime)
# calculate absolute difference
df['Diff'] = (df['B'] - df['A']).abs()
# filter for difference equal to mapped minimum
res = df.loc[df['Diff'] == df['A'].map(df.groupby('A')['Diff'].min())]
结果:
A B C Diff
1 2002-01-16 2002-01-30 Helen 14 days
3 2002-01-16 2002-01-30 Jud 14 days
4 2002-04-27 2002-04-30 Nick 3 days
6 2002-04-27 2002-04-30 Bryan 3 days