根据日期和分组依据过滤熊猫数据框

时间:2019-12-12 02:19:26

标签: pandas filter pandas-groupby

我有以下数据框:

Date    group   File1   File2   Begin Date  End Date
4/28/2014   A   CC2015H CC2015K 5/1/2014    2/2/2015
4/29/2014   A   CC2015H CC2015K 5/1/2014    2/2/2015
4/30/2014   A   CC2015H CC2015K 5/1/2014    2/2/2015
5/1/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
5/2/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
1/22/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/23/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/26/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/27/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/28/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/29/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/30/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
2/2/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/3/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/4/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/5/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/6/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
8/25/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/26/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/27/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/28/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/29/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
9/2/2014    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/7/2015    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/10/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/11/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/12/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/13/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/14/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/17/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/18/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/19/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/20/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015

它实际上是一个更大的数据框,带有更多的组。我出于显示目的将其缩短。 我正在尝试按以下方式过滤日期列上的数据框:

df = df.loc[df.groupby(['group','File1', 'File2']).df['Date'] >= df.groupby(['group', 'File1', 'File2'])['Begin Date']

输出应如下:

Date    group   File1   File2   Begin Date  End Date
5/1/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
5/2/2014    A   CC2015H CC2015K 5/1/2014    2/2/2015
1/22/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/23/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/26/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/27/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/28/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/29/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
1/30/2015   A   CC2015H CC2015K 5/1/2014    2/2/2015
2/2/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/3/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/4/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/5/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
2/6/2015    A   CC2015H CC2015K 5/1/2014    2/2/2015
8/29/2014   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
9/2/2014    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/7/2015    B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/10/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/11/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/12/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/13/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/14/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/17/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/18/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/19/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015
8/20/2015   B   ZC2015U ZC2015Z 8/29/2014   8/14/2015

奖金问题:我想按开始日期和结束日期过滤,即按条件保留组

df['Date'] >= df['Begin Date'] & df['Date'] <= df['End Date']

感谢您的任何帮助或建议。

1 个答案:

答案 0 :(得分:0)

我认为这里不需要groupby,因为您没有在每个组中汇总任何东西(最小,最大,总和,计数等)。

between是您要寻找的:

df[df['Date'].between(df['Begin Date'], df['End Date'])]