说我有两个数据框:
df1: df2:
+-------------------+----+ +-------------------+-----+
| Timestamp |data| | Timestamp |stuff|
+-------------------+----+ +-------------------+-----+
|2019/04/02 11:00:01| 111| |2019/04/02 11:00:14| 101|
|2019/04/02 11:00:15| 222| |2019/04/02 11:00:15| 202|
|2019/04/02 11:00:29| 333| |2019/04/02 11:00:16| 303|
|2019/04/02 11:00:30| 444| |2019/04/02 11:00:30| 404|
+-------------------+----+ |2019/04/02 11:00:31| 505|
+-------------------+-----+
在没有循环遍历df2的每一行的情况下,我尝试根据时间戳将两个数据帧合并。因此,对于df2中的每一行,它将“添加”来自该特定时间的df1中的数据。在此示例中,结果数据框将为:
Adding df1 data to df2:
+-------------------+-----+----+
| Timestamp |stuff|data|
+-------------------+-----+----+
|2019/04/02 11:00:14| 101| 111|
|2019/04/02 11:00:15| 202| 222|
|2019/04/02 11:00:16| 303| 222|
|2019/04/02 11:00:30| 404| 444|
|2019/04/02 11:00:31| 505|None|
+-------------------+-----+----+
遍历df2的每一行,然后与每个df1进行比较,效率很低。还有另一种方法吗?
答案 0 :(得分:2)
使用merge_asof
:
df1['Timestamp'] = pd.to_datetime(df1['Timestamp'])
df2['Timestamp'] = pd.to_datetime(df2['Timestamp'])
df = pd.merge_asof(df2, df1, on='Timestamp')
print (df)
Timestamp stuff data
0 2019-04-02 11:00:14 101 111
1 2019-04-02 11:00:15 202 222
2 2019-04-02 11:00:16 303 222
3 2019-04-02 11:00:30 404 444
还可以通过df1
更改订单df2
并添加参数direction='forward'
:
df = pd.merge_asof(df1, df2, on='Timestamp', direction='forward')
print (df)
Timestamp data stuff
0 2019-04-02 11:00:01 111 101.0
1 2019-04-02 11:00:15 222 202.0
2 2019-04-02 11:00:29 333 404.0
3 2019-04-02 11:00:30 444 404.0
4 2019-04-02 11:00:31 505 NaN
#default direction='backward'
df = pd.merge_asof(df1, df2, on='Timestamp')
print (df)
Timestamp data stuff
0 2019-04-02 11:00:01 111 NaN
1 2019-04-02 11:00:15 222 202.0
2 2019-04-02 11:00:29 333 303.0
3 2019-04-02 11:00:30 444 404.0
4 2019-04-02 11:00:31 505 404.0
答案 1 :(得分:0)
import pandas as pd pd.merge(df1, df2, left_on=['Timestamp'], right_on=['Timestamp'])