我有2个数据集:
df1:
teamid points startdate enddate
1 30 2017-07-01 2018-06-30
2 41 2016-07-01 2017-06-30
3 32 2016-07-01 2017-06-30
df2:
teamid date color
1 2017-01-02 red
1 2018-01-02 yellow
2 2017-06-05 blue
3 2014-01-05 red
4 2016-03-02 brown
我想在df1.startdate和df1.enddate之间的匹配列df1.teamid == df2.teamid和df2.date上过滤df2。
我尝试了以下各种变化:
df2_filtered = df2[(df2['teamid'].isin(df1['teamid'])) & (df2['date'] >= df1['startdate']) & (df2['date'] <= df1['enddate'])]
这给我带来了ValueError:只能比较标记相同的Series对象。
我也尝试过
df2_filtered = df2[(df2['teamid'].isin(df1['teamid'])) & (str(df2['date']) >= df1['startdate']) & (str(df2['date']) <= df1['enddate'])]
这将导致0行。基于df1和df2,应该弹出匹配的行(df2的第2行和第3行)。
我应该如何设置过滤器,为什么最后一个选项不起作用?
答案 0 :(得分:2)
IIUC
ndf = pd.merge(df,df2, on='teamid', how='outer')
ndf.loc[ndf.date.between(ndf.startdate, ndf.enddate)]
teamid points startdate enddate date color
1 1 30.0 2017-07-01 2018-06-30 2018-01-02 yellow
2 2 41.0 2016-07-01 2017-06-30 2017-06-05 blue