如何根据规则提取唯一行?

时间:2019-02-19 07:14:11

标签: python pandas

我有一个像这样的数据框

id_1,date_1,id_2,date_2

我需要数据框,其中行(日期_1 + 15天)<日期_2 如果此规则匹配,则只需要第一次出现

仅使用布尔掩码不能解决问题

所以我认为可能是我需要使用 某种for index, row in df.iterrows(): 并创建新的数据框

2 个答案:

答案 0 :(得分:0)

import pandas as pd
from datetime import timedelta

df = pd.DataFrame(data={'id_1':[1,2,3,4], 
                        'date1': ['2018-01-10', '2018-02-05', '2018-02-20', '2018-02-21'],
                        'date2': ['2018-01-11', '2018-02-15', '2018-02-27', '2018-02-22']})


df[['date1', 'date2']] = df[['date1', 'date2']].apply(pd.to_datetime)


df['date1_15'] = df['date1'] + timedelta(15)
df = df.loc[df['date1_15'] < df['date2']].head(1)

答案 1 :(得分:0)

import pandas as pd
from datetime import timedelta

df = pd.DataFrame(data=dd)
df[['date_1', 'date_2']] = df[['date_1', 'date_2']].apply(pd.to_datetime)

df['date_1_15'] = df['date_1'] + timedelta(4)

def apply_mask(row):
    if row['date_1_15'] < row['date_2']:
        row['mask'] = True
    else:
        row['mask'] = False
    return row

df = df.apply(lambda row: apply_mask(row), axis=1)
dx = df.loc[df['mask'] == True]
dx = dx.groupby(['date_1']).first()
dx['mask_first'] = True
dx = dx.reset_index()
dx = dx[['date_1', 'date_2', 'mask_first']]
df = pd.merge(df, dx, on=['date_1', 'date_2'], how='outer')