熊猫根据日期范围和另一列过滤数据框

时间:2019-04-12 17:29:52

标签: python pandas date

我有一个名为pandas的{​​{1}}数据帧,并且想根据数据帧df1中的条件过滤该数据帧,对于特定的df2,我只希望日期从grp_id中的year列开始,一直到df2中所示的最近一年(2016)。这只是我的数据的一个子集,其中我至少有10个唯一的df3到具有不同起始年的子集。

df1

grp_id

df2

       db_id           cert_status grp_id       year   cap prov
130   IX-011  not-certified member     SD 2004-01-01  30.0   KB
131   IX-011  not-certified member     SD 2005-01-01  30.0   KB
132   IX-011  not-certified member     SD 2006-01-01  30.0   KB
133   IX-011  not-certified member     SD 2007-01-01  30.0   KB
134   IX-011  not-certified member     SD 2008-01-01  30.0   KB
135   IX-011  not-certified member     SD 2009-01-01  30.0   KB
136   IX-011  not-certified member     SD 2010-01-01  30.0   KB
137   IX-011  not-certified member     SD 2011-01-01  30.0   KB
138   IX-011  not-certified member     SD 2012-01-01  30.0   KB
139   IX-011  not-certified member     SD 2013-01-01  30.0   KB
140   IX-011  not-certified member     SD 2014-01-01  30.0   KB
141   IX-011  not-certified member     SD 2015-01-01  30.0   KB
142   IX-011  not-certified member     SD 2016-01-01  30.0   KB
208   IX-017  not-certified member     CG 2004-01-01  30.0   KB
209   IX-017  not-certified member     CG 2005-01-01  30.0   KB
210   IX-017  not-certified member     CG 2006-01-01  30.0   KB
211   IX-017  not-certified member     CG 2007-01-01  30.0   KB
212   IX-017  not-certified member     CG 2008-01-01  30.0   KB
213   IX-017  not-certified member     CG 2009-01-01  30.0   KB
214   IX-017  not-certified member     CG 2010-01-01  30.0   KB
215   IX-017  not-certified member     CG 2011-01-01  30.0   KB
216   IX-017  not-certified member     CG 2012-01-01  30.0   KB
217   IX-017  not-certified member     CG 2013-01-01  80.0   KB
218   IX-017  not-certified member     CG 2014-01-01  30.0   KB
219   IX-017  not-certified member     CG 2015-01-01  30.0   KB
220   IX-017  not-certified member     CG 2016-01-01  30.0   KB

df3

   grp_id member       year
4     SD       Y 2007-01-01
6     CG       Y 2011-01-01

最简单,最快的方法是什么?

1 个答案:

答案 0 :(得分:2)

尝试结合使用mergequery进行过滤:

df1.merge(df2, on = ['grp_id'], suffixes=('','_2'), right_index=True)\
   .query('year >= year_2')[df1.columns]

输出:

      db_id           cert_status grp_id        year   cap prov
133  IX-011  not-certified member     SD  2007-01-01  30.0   KB
134  IX-011  not-certified member     SD  2008-01-01  30.0   KB
135  IX-011  not-certified member     SD  2009-01-01  30.0   KB
136  IX-011  not-certified member     SD  2010-01-01  30.0   KB
137  IX-011  not-certified member     SD  2011-01-01  30.0   KB
138  IX-011  not-certified member     SD  2012-01-01  30.0   KB
139  IX-011  not-certified member     SD  2013-01-01  30.0   KB
140  IX-011  not-certified member     SD  2014-01-01  30.0   KB
141  IX-011  not-certified member     SD  2015-01-01  30.0   KB
142  IX-011  not-certified member     SD  2016-01-01  30.0   KB
215  IX-017  not-certified member     CG  2011-01-01  30.0   KB
216  IX-017  not-certified member     CG  2012-01-01  30.0   KB
217  IX-017  not-certified member     CG  2013-01-01  80.0   KB
218  IX-017  not-certified member     CG  2014-01-01  30.0   KB
219  IX-017  not-certified member     CG  2015-01-01  30.0   KB
220  IX-017  not-certified member     CG  2016-01-01  30.0   KB