根据来自另一个数据框的两列过滤数据

时间:2018-10-29 12:03:39

标签: pandas dataframe python-3.5

我有以下两个数据框:

* def response = 
"""
[
 "BP Part Sht NCA MS",
  "BP Part Sht NCA MS",
  "BP Part Sht NCA MS",
  "BP Part Sht NCA MS",
  "BP Part Sht NCA MS",
  "Bay Pond USB, Inc MS",
  "Bay Pond USB, Inc MS",
  "BP USB III Inc MS",
  "BP USB III Inc MS",
  "BP USB III Inc MS",
  "BP USB III Inc MS",
  "BP CS Sht NCA",
  "BP CS Sht NCA",
  "BP CS Sht NCA",
  "BP CS Sht NCA", 
  "BP USB IV, Inc MS",
  "BP Mrts Block NCA MS",
  "BP Mrts Block NCA MS"
]
"""
* json response = new java.util.HashSet(response)
* def expected =
"""
[
  "BP Part Sht NCA MS",
  "Bay Pond USB, Inc MS",
  "BP USB III Inc MS",
  "BP CS Sht NCA",
   "BP USB IV, Inc MS",
  "BP Mrts Block NCA MS",
]
"""
* match response contains only expected

我想从df = pd.DataFrame({ 'id': ['1', '1', '2', '3', '3', '8','4', '1', '2', '4'], 'start': ['2017-01-01', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-04-01', '2017-01-01', '2017-04-01', '2017-05-01', '2017-02-01'], 'end': ['2017-01-02', '2017-02-4', '2017-03-02', '2017-02-06', '2017-03-01', '2017-04-03', '2017-01-06', '2017-04-08', '2017-05-04', '2017-02-01'] }) df1 = pd.DataFrame({ 'date': ['2017-01-02', '2017-02-01', '2017-03-01', '2017-02-01', '2017-03-01', '2017-04-01'], 'id': ['1', '2', '3','4', '5', '6'] }) 中仅提取df中的iddf中的iddf1中的{该特定date的{​​{1}}也与df1中的idstart相匹配或在其之间。
通过比较第二个数据帧end中是否存在相同的df,我可以轻松地从id中提取df

id

但是我无法比较df1的{​​{1}}与df_filtered = df[(df['id'].isin(df1['id']))] 的{​​{1}}和date。我想要的输出如下:

df1

date,start和end列已采用日期时间格式Y-M-D。任何帮助将不胜感激。

2 个答案:

答案 0 :(得分:1)

您可能想merge

df.merge(df1, on='id', how='inner')

        end id       start        date
 0  2017-01-02  1  2017-01-01  2017-01-02
 1   2017-02-4  1  2017-02-01  2017-01-02
 2  2017-04-08  1  2017-04-01  2017-01-02
 3  2017-03-02  2  2017-03-01  2017-02-01
 4  2017-05-04  2  2017-05-01  2017-02-01
 5  2017-02-06  3  2017-02-01  2017-03-01
 6  2017-03-01  3  2017-03-01  2017-03-01
 7  2017-01-06  4  2017-01-01  2017-02-01
 8  2017-02-01  4  2017-02-01  2017-02-01

然后比较列

答案 1 :(得分:1)

合并和过滤:

df2 = df.merge(df1)
df2[(df2['date']>=df2['start'])&(df2['date']<=df2['end'])]