在df1中设置与df2中的值匹配的值

时间:2019-12-12 12:03:31

标签: python pandas dataframe

我有2个数据框。 df1保存了我的数据,df2进行了一些更正。我想用speedup中的值替换df1中的df2值,其中df2中的其他列指定df1中的值来替换该值

df1 = pd.DataFrame({
    'subject': ['English', 'Maths', 'Physics', 'English', 'Arts', 'Physics', 'English', 'PE'],
    'grade': ['D', 'A', 'A', 'C', 'F', 'B', 'C', 'A'],
    'date': pd.bdate_range(end='2019-12-12', periods=8)
})

df1['speedup'] = 1.0

df2 = pd.DataFrame({
    'subject': ['Maths', 'Physics'],
    'date': ['2019-12-04', '2019-12-10'],
    'speedup': [1.1, 0.7]
})

上面的代码将生成如下所示的DataFrame:

Out[1]: 
   subject grade       date  speedup
0  English     D 2019-12-03      1.0
1    Maths     A 2019-12-04      1.0
2  Physics     A 2019-12-05      1.0
3  English     C 2019-12-06      1.0
4     Arts     F 2019-12-09      1.0
5  Physics     B 2019-12-10      1.0
6  English     C 2019-12-11      1.0
7       PE     A 2019-12-12      1.0
df2
Out[2]: 
   subject        date  speedup
0    Maths  2019-12-04      1.1
1  Physics  2019-12-10      0.7

为避免混淆,我想使df1df2合并后看起来像这样:

df1 = pd.DataFrame({
        'subject': ['English', 'Maths', 'Physics', 'English', 'Arts', 'Physics', 'English', 'PE'],
        'grade': ['D', 'A', 'A', 'C', 'F', 'B', 'C', 'A'],
        'date': pd.bdate_range(end='2019-12-12', periods=8),
        'speedup': [1, 1.1, 1, 1, 1, 0.7, 1, 1]
    })

我尝试了这个不起作用

df1[(df1['date'].isin(df2['date'])) & (df1['subject'].isin(df2['subject']))]['speedup'] = df2['speedup']

由于合并键中的数据时间组件,因此合并无效。

df1.merge(df2, left_on=['subject', 'date'], right_on=['subject', 'date'], suffixes=('', '_y'))

2 个答案:

答案 0 :(得分:1)

将字符串日期转换为datetime对象,然后执行merge

df2['date'] = pd.to_datetime(df2['date'], format='%Y-%m-%d')
df1.merge(df2, how='left', on=['subject', 'date']).fillna(method='ffill', axis=1)

这将为您带来以下结果 enter image description here

答案 1 :(得分:0)

我认为不是合并两个数据帧,而是一种更有效的方法是将第二个数据帧df2用作具有两个索引的字典。

代码如下:

##Setting the index of df2 to(date,subject)

df2.set_index(['date','subject'],inplace = True)


##This step is to make sure that the indexed date of df2 matches the date of df1

df1['date'] =df1['date'].apply(lambda x: x.strftime('%Y-%m-%d'))

##Iterating over the rows in the df1 and finding the value for speedup if it exists in the second dataframe

for i,val in df1.iterrows():
    #Check for a value for the tuple (date,subject) in the map
    rep_speedup = df2.loc[df1.loc[i,['date','subject']],'speedup'].values

    #If the map contained a speedup, then replace the existing speedup
    if len(rep_speedup) >0:
        df1.loc[i,'speedup']  =rep_speedup

这将打印出df1,如下所示:

    subject grade   date    speedup
0   English D   2019-12-03  1.0    
1   Maths   A   2019-12-04  1.1    
2   Physics A   2019-12-05  1.0    
3   English C   2019-12-06  1.0    
4   Arts    F   2019-12-09  1.0    
5   Physics B   2019-12-10  0.7    
6   English C   2019-12-11  1.0    
7   PE      A   2019-12-12  1.0