Question

我有两个不同的数据

天气：

    time                dayweather
0   2015-11-14 01:00:00   Clouds
1   2015-11-14 03:00:00   Clouds
2   2015-11-14 05:00:00   Clouds
3   2015-11-14 06:00:00   Clouds
4   2015-11-14 08:00:00   Clouds

速度

    id            time                  machine_id    speed_gps_kph  latitude     longitude
0   14641931007   2015-11-15 17:46:40         10051         3       -36.725578    174.71482
1   14642568129   2015-11-15 18:12:41         10051        13       -36.769465    174.74159
2   14641876524   2015-11-15 17:44:30         10051        0        -36.723136    174.714432
3   14642262476   2015-11-15 18:00:47         10051        17       -36.747435    174.723397
4   14641991113   2015-11-15 17:49:43         10051        6        -36.72826     174.715083

我需要按时合并这两个数据然后检查天气如何影响数据。所以我合并了两个数据

dfs = [dataframe,weather]
checkweather = reduce(lambda left,right: pd.merge(left,right,how='left',on='time'), dfs)

但我的问题是你可以看到dayweather数据似乎只是完全相同的时间。

那么如果天气数据时间显示天气结果，如果速度数据时间在2小时之间（每小时记录一些天气时间）怎么办？

Answer 1

merge_asof的解决方案：

print (dataframe)
                 time dayweather
0 2015-11-14 01:00:00    Clouds1
1 2015-11-14 03:00:00    Clouds2
2 2015-11-14 05:00:00    Clouds3
3 2015-11-14 06:00:00    Clouds4
4 2015-11-14 08:00:00       Rain

print (weather)
            id                time  machine_id  speed_gps_kph   latitude  \
0  14641931007 2015-11-14 04:46:40       10051              3 -36.725578   
1  14642568129 2015-11-14 05:12:41       10051             13 -36.769465   
2  14641876524 2015-11-14 06:44:30       10051              0 -36.723136   
3  14642262476 2015-11-14 07:00:47       10051             17 -36.747435   
4  14641991113 2015-11-15 17:49:43       10051              6 -36.728260   

    longitude  
0  174.714820  
1  174.741590  
2  174.714432  
3  174.723397  
4  174.715083

df = pd.merge_asof(weather, dataframe, on='time', tolerance=pd.Timedelta('2H'))
print (df)
            id                time  machine_id  speed_gps_kph   latitude  \
0  14641931007 2015-11-14 04:46:40       10051              3 -36.725578   
1  14642568129 2015-11-14 05:12:41       10051             13 -36.769465   
2  14641876524 2015-11-14 06:44:30       10051              0 -36.723136   
3  14642262476 2015-11-14 07:00:47       10051             17 -36.747435   
4  14641991113 2015-11-15 17:49:43       10051              6 -36.728260   

    longitude dayweather  
0  174.714820    Clouds2  
1  174.741590    Clouds3  
2  174.714432    Clouds4  
3  174.723397    Clouds4  
4  174.715083        NaN

另一个解决方案是使用resample ffill（first是必要的，因为文本列）来映射Series然后map列time被floor截断：

m = dataframe.set_index('time').resample('H')['dayweather'].first().ffill()
weather['dayweather'] = weather['time'].dt.floor('H').map(m)
print (weather)
            id                time  machine_id  speed_gps_kph   latitude  \
0  14641931007 2015-11-14 04:46:40       10051              3 -36.725578   
1  14642568129 2015-11-14 05:12:41       10051             13 -36.769465   
2  14641876524 2015-11-14 06:44:30       10051              0 -36.723136   
3  14642262476 2015-11-14 07:00:47       10051             17 -36.747435   
4  14641991113 2015-11-15 17:49:43       10051              6 -36.728260   

    longitude dayweather  
0  174.714820    Clouds2  
1  174.741590    Clouds3  
2  174.714432    Clouds4  
3  174.723397    Clouds4  
4  174.715083        NaN

在几小时内合并两个日期列

1 个答案: