根据2个数据框中的日期值更新熊猫数据框

时间:2020-09-06 04:55:45

标签: python pandas dataframe merge

我有两个数据框,如下所示:

year1 = {'DAY':['MON', 'MON', 'MON', 'TUE', 'TUE', 'TUE'],
    'TEMP':[12, 13, 14, 15, 15, 18],
    'DATE':['01/01/20', '02/01/20', '03/01/20', '06/01/20', '07/01/20', '08/01/20']}
df1 = pd.DataFrame(year1)

year2 = {'DAY':['MON', 'MON', 'MON', 'TUE', 'TUE', 'TUE'],
    'TEMP':[15, 15, 15, 15, 14, 14],
    'DATE':['01/01/20', '02/01/20', '03/01/20', '06/01/20', '07/01/20', '10/01/20']}
df2 = pd.DataFrame(year2)

数据帧未按日期编制索引(索引是其他一些列)。我想将数据框与这两个数据框中的日期值匹配的行合并,并根据日期匹配添加新列:

df_FINAL['AVG_TEMP'] = (df1['TEMP'] + df2['TEMP']) / 2

所以最终的DataFrame应该像这样:

   DAY  TEMP      DATE    AVG_TEMP
0  MON    15  01/01/20     13.5
1  MON    15  02/01/20     14.0
2  MON    15  03/01/20     14.5
3  TUE    15  06/01/20     15.0
4  TUE    14  07/01/20     14.5

如何实现?

4 个答案:

答案 0 :(得分:2)

您可以在pd.mergeDATE列上使用DAY,因为相同的日期将是同一天。将合并中创建的TEMP_xTEMP_y列的平均值取为AVG_TEMP,然后删除TEMP_xTEMP_y列。

import pandas as pd

year1 = {'DAY':['MON', 'MON', 'MON', 'TUE', 'TUE', 'TUE'],
    'TEMP':[12, 13, 14, 15, 15, 18],
    'DATE':['01/01/20', '02/01/20', '03/01/20', '06/01/20', '07/01/20', '08/01/20']}
df1 = pd.DataFrame(year1)

year2 = {'DAY':['MON', 'MON', 'MON', 'TUE', 'TUE', 'TUE'],
    'TEMP':[15, 15, 15, 15, 14, 14],
    'DATE':['01/01/20', '02/01/20', '03/01/20', '06/01/20', '07/01/20', '10/01/20']}
df2 = pd.DataFrame(year2)

df_result = df1.merge(df2, on=["DATE","DAY"])
df_result['AVG_TEMP'] = (df_result['TEMP_x'] + df_result['TEMP_y']) / 2
df_result = df_result.drop(columns=['TEMP_x','TEMP_y'])

输出:

>>> df_result
   DAY      DATE  AVG_TEMP
0  MON  01/01/20      13.5
1  MON  02/01/20      14.0
2  MON  03/01/20      14.5
3  TUE  06/01/20      15.0
4  TUE  07/01/20      14.5

答案 1 :(得分:0)

使用内部联接在两列上调用pd.merge()(值必须同时出现在df中才能在结果中出现)以创建中间df。然后创建一个新列来计算平均值

df3 = df1.merge(df2,on=['DATE','DAY'],how='inner')
df3['AVG_TEMP'] = (df3.TEMP_x + df3.TEMP_y)/2
df3.drop(['TEMP_x','TEMP_y'],inplace=True,axis=1)

答案 2 :(得分:0)

您可以使用merge命令并使用lambda函数完成所有这些操作。我还为您提供了一些备用选项,以便您知道它们对您可用。

import pandas as pd
year1 = {'DAY':['MON', 'MON', 'MON', 'TUE', 'TUE', 'TUE'],
    'TEMP':[12, 13, 14, 15, 15, 18],
    'DATE':['01/01/20', '02/01/20', '03/01/20', '06/01/20', '07/01/20', '08/01/20']}
df1 = pd.DataFrame(year1)

year2 = {'DAY':['MON', 'MON', 'MON', 'TUE', 'TUE', 'TUE'],
    'TEMP':[15, 15, 15, 15, 14, 14],
    'DATE':['01/01/20', '02/01/20', '03/01/20', '06/01/20', '07/01/20', '10/01/20']}
df2 = pd.DataFrame(year2)

#merge on inner join based on your example
#you can either use rename or suffix. here i am using suffix
#first suffix is stripped, second is _y which will be later dropped
#kept .rename command in case you want to try that option

您的问题的答案从这里开始

df_FINAL = (pd.merge(df2, df1, on = "DATE",how='inner',suffixes=('', '_y'))        
        #.rename(columns={'DAY_x':'DAY','TEMP_x':'TEMP'})
        .assign(AVG_TEMP = lambda x: (x['TEMP'] + x['TEMP_y'])/2))

#drop the _y columns as you don't need them
df_FINAL.drop(list(df_FINAL.filter(regex='_y$')), axis=1, inplace=True)

print(df_FINAL)

执行此操作的另一种方法是将所有这些合并为一个命令,如下所示:

#merge on inner join based on your example
#first suffix is stripped, second is _y which will be later dropped
#after the processing, filter out the column with _y

df_FINAL = (pd.merge(df2, df1, on = "DATE",how='inner',suffixes=('', '_y'))        
        .assign(AVG_TEMP = lambda x: (x['TEMP'] + x['TEMP_y'])/2)
        .filter(regex='^(?!.*_y)'))

最终结果如下:

   DAY  TEMP      DATE  AVG_TEMP
0  MON    15  01/01/20      13.5
1  MON    15  02/01/20      14.0
2  MON    15  03/01/20      14.5
3  TUE    15  06/01/20      15.0
4  TUE    14  07/01/20      14.5

答案 3 :(得分:0)

使用pd.concat()df.groupby

df3 = pd.concat([df2, df1])
df3['AVG_TEMP'] = df3.groupby('DATE', as_index=False)['TEMP'].apply(lambda x: x.mean() if len(x) > 1 else None)
df3 = df3.groupby('DATE', as_index=False).first().dropna()

print(df3)

输出:

       DATE  DAY  TEMP  AVG_TEMP
0  01/01/20  MON    15      13.5
1  02/01/20  MON    15      14.0
2  03/01/20  MON    15      14.5
3  06/01/20  TUE    15      15.0
4  07/01/20  TUE    14      14.5