熊猫-如何在不同格式的日期时间列上合并数据框?

时间:2019-08-06 12:03:15

标签: python pandas merge timestamp

我有两个数据框,需要根据日期进行合并。第一个数据帧如下:

             Time Stamp  HP_1H_mean  Coolant1_1H_mean  Extreme_1H_mean
0   2019-07-26 07:00:00  410.637966        414.607081              0.0   
1   2019-07-26 08:00:00  403.521735        424.787366              0.0   
2   2019-07-26 09:00:00  403.143925        425.739639              0.0   
3   2019-07-26 10:00:00  410.542895        426.210538              0.0
...
17  2019-07-27 00:00:00    0.000000          0.000000              0.0   
18  2019-07-27 01:00:00    0.000000          0.000000              0.0   
19  2019-07-27 02:00:00    0.000000          0.000000              0.0   
20  2019-07-27 03:00:00    0.000000          0.000000              0.0 

第二个是这样的:

    Time Stamp  Qty Compl
0   2019-07-26  150
1   2019-07-27  20
2   2019-07-29  230
3   2019-07-30  230
4   2019-07-31  170

两个Time Stamp列均为datetime64[ns]。我想向左合并,然后将日期向前一天填充到所有其他行中。我的问题是在合并中,第二个df的Qty Compl在每天的午夜应用,并且某些日子没有午夜时间戳,例如第一个数据帧中的第一天。

是否有一种方法可以合并和匹配包含同一天的每一行?所需的输出如下所示:

         Time Stamp  HP_1H_mean  Coolant1_1H_mean  Extreme_1H_mean    Qty Compl
0   2019-07-26 07:00:00  410.637966        414.607081              0.0      150   
1   2019-07-26 08:00:00  403.521735        424.787366              0.0      150
2   2019-07-26 09:00:00  403.143925        425.739639              0.0      150
3   2019-07-26 10:00:00  410.542895        426.210538              0.0      150
...
17  2019-07-27 00:00:00    0.000000          0.000000              0.0      20
18  2019-07-27 01:00:00    0.000000          0.000000              0.0      20
19  2019-07-27 02:00:00    0.000000          0.000000              0.0      20
20  2019-07-27 03:00:00    0.000000          0.000000              0.0      20

1 个答案:

答案 0 :(得分:2)

使用merge_asof对按日期时间对两个DataFrame进行排序:

#if necessary
df1['Time Stamp'] = pd.to_datetime(df1['Time Stamp'])
df2['Time Stamp'] = pd.to_datetime(df2['Time Stamp'])
df1 = df1.sort_values('Time Stamp')
df2 = df2.sort_values('Time Stamp')

df = pd.merge_asof(df1, df2, on='Time Stamp')
print (df)
           Time Stamp  HP_1H_mean  Coolant1_1H_mean  Extreme_1H_mean  \
0 2019-07-26 07:00:00  410.637966        414.607081              0.0   
1 2019-07-26 08:00:00  403.521735        424.787366              0.0   
2 2019-07-26 09:00:00  403.143925        425.739639              0.0   
3 2019-07-26 10:00:00  410.542895        426.210538              0.0   
4 2019-07-27 00:00:00    0.000000          0.000000              0.0   
5 2019-07-27 01:00:00    0.000000          0.000000              0.0   
6 2019-07-27 02:00:00    0.000000          0.000000              0.0   
7 2019-07-27 03:00:00    0.000000          0.000000              0.0   

   Qty Compl  
0        150  
1        150  
2        150  
3        150  
4         20  
5         20  
6         20  
7         20