Pandas-条件信息检索日期范围

时间:2017-06-07 21:45:47

标签: pandas numpy data-management

我对熊猫还是比较新的,我为完成一项看似简单的任务所写的剧本似乎不必要地复杂化。如果你们知道一个更简单的方法来实现这一点,我将非常感激。

任务: 我讨厌两个电子表格(df1和df2),每个电子表格都有一个标识符(mrn)和一个日期。我的任务是,如果满足以下条件,则从df2中为df1中的每一行检索一个值:

df1中给定行的标识符存在于df2

如果满足以上条件,则如果关联日期在距离df1中的日期+/- 5天范围内,则检索df2中的值。

我写了以下代码来完成这个:

#%%housekeeping
import numpy as np
import pandas as pd
import csv
import datetime
from datetime import datetime, timedelta 
import sys
from io import StringIO

#%%dataframe import
df1=',mrn,date,foo\n0,1,2015-03-06,n/a\n1,11,2009-08-14,n/a\n2,14,2009-05-18,n/a\n3,20,2010-06-19,n/a\n'
df2=',mrn,collection Date,Report\n0,1,2015-03-06,report to import1\n1,11,2009-08-12,report to import11\n2,14,2009-05-21,report to import14\n3,20,2010-06-25,report to import20\n'

df1 = pd.read_csv(StringIO(df1))
df2 = pd.read_csv(StringIO(df2))


#converting to date-time format
df1['date']=pd.to_datetime(df1['date'])
df2['collection Date']=pd.to_datetime(df2['collection Date'])

#%%mask()   
def mask(df2, rangeTime):
    mask= (df2> rangeTime -timedelta(days=5)) & (df2 <= rangeTime + timedelta(days=5))
    return mask

#%% detailLoop()
i=0
for element in df1["mrn"]:
    df1DateIter = df1.ix[i, 'date']
    df2MRNmatch= df2.loc[df2['mrn']==element, ['collection Date', 'Report']]
    df2Date= df2MRNmatch['collection Date']
    df2Report= df2MRNmatch['Report']
    maskOut= mask(df2Date, df1DateIter)
    dateBoolean= maskOut.iloc[0]
    if dateBoolean==True: 
        df1.ix[i, 'foo'] = df2Report.iloc[0]      
    i+=1

#:一旦脚本运行,df1看起来像:

Out[824]: 

   mrn       date                 foo
0    1 2015-03-06   report to import1
1   11 2009-08-14  report to import11
2   14 2009-05-18  report to import14
3   20 2010-06-19                 NaN

0 个答案:

没有答案