按组按日期​​之间的最小绝对差异选择行

时间:2018-04-03 16:01:33

标签: python pandas datetime dataframe

            A           B      C
0  2002-01-16  2002-02-28   Jack
1  2002-01-16  2002-01-30  Helen
2  2002-01-16  2002-02-28  Peter
3  2002-01-16  2002-01-30    Jud
4  2002-04-27  2002-04-30   Nick
5  2002-04-27  2002-05-25  Wendy
6  2002-04-27  2002-04-30  Bryan
7  2002-04-27  2002-05-25  Sarah

我想根据每个A群体选择B日期与A日期相近的行。

输出应为:

            A           B      C
1  2002-01-16  2002-01-30  Helen
3  2002-01-16  2002-01-30    Jud
4  2002-04-27  2002-04-30   Nick
6  2002-04-27  2002-04-30  Bryan

2 个答案:

答案 0 :(得分:3)

使用:

df = df[df['B'].sub(df['A']).groupby(df['A']).transform(lambda x: x == x.min())]
print (df)
           A          B      C
1 2002-01-16 2002-01-30  Helen
3 2002-01-16 2002-01-30    Jud
4 2002-04-27 2002-04-30   Nick
6 2002-04-27 2002-04-30  Bryan

<强>详情:

print (df['B'].sub(df['A']))

0   43 days
1   14 days
2   43 days
3   14 days
4    3 days
5   28 days
6    3 days
7   28 days
dtype: timedelta64[ns]

print (df['B'].sub(df['A']).groupby(df['A']).transform(lambda x: x == x.min()))
0    False
1     True
2    False
3     True
4     True
5    False
6     True
7    False
dtype: bool

答案 1 :(得分:1)

这是一种方式。

# convert columns to datetime
df[['A', 'B']] = df[['A', 'B']].apply(pd.to_datetime)

# calculate absolute difference
df['Diff'] = (df['B'] - df['A']).abs()

# filter for difference equal to mapped minimum
res = df.loc[df['Diff'] == df['A'].map(df.groupby('A')['Diff'].min())]

结果:

           A          B      C    Diff
1 2002-01-16 2002-01-30  Helen 14 days
3 2002-01-16 2002-01-30    Jud 14 days
4 2002-04-27 2002-04-30   Nick  3 days
6 2002-04-27 2002-04-30  Bryan  3 days