在时间戳上联接两个不同的数据框

时间:2019-04-18 13:09:37

标签: python pandas dataframe

说我有两个数据框:

df1:                          df2:
+-------------------+----+    +-------------------+-----+
|  Timestamp        |data|    |  Timestamp        |stuff|
+-------------------+----+    +-------------------+-----+
|2019/04/02 11:00:01| 111|    |2019/04/02 11:00:14|  101|
|2019/04/02 11:00:15| 222|    |2019/04/02 11:00:15|  202|
|2019/04/02 11:00:29| 333|    |2019/04/02 11:00:16|  303|
|2019/04/02 11:00:30| 444|    |2019/04/02 11:00:30|  404|
+-------------------+----+    |2019/04/02 11:00:31|  505|
                              +-------------------+-----+

在没有循环遍历df2的每一行的情况下,我尝试根据时间戳将两个数据帧合并。因此,对于df2中的每一行,它将“添加”来自该特定时间的df1中的数据。在此示例中,结果数据框将为:

Adding df1 data to df2:
+-------------------+-----+----+
|  Timestamp        |stuff|data|
+-------------------+-----+----+
|2019/04/02 11:00:14|  101| 111|
|2019/04/02 11:00:15|  202| 222|
|2019/04/02 11:00:16|  303| 222|
|2019/04/02 11:00:30|  404| 444|
|2019/04/02 11:00:31|  505|None|
+-------------------+-----+----+

遍历df2的每一行,然后与每个df1进行比较,效率很低。还有另一种方法吗?

2 个答案:

答案 0 :(得分:2)

使用merge_asof

df1['Timestamp'] = pd.to_datetime(df1['Timestamp'])
df2['Timestamp'] = pd.to_datetime(df2['Timestamp'])

df = pd.merge_asof(df2, df1, on='Timestamp')
print (df)
            Timestamp  stuff  data
0 2019-04-02 11:00:14    101   111
1 2019-04-02 11:00:15    202   222
2 2019-04-02 11:00:16    303   222
3 2019-04-02 11:00:30    404   444

还可以通过df1更改订单df2并添加参数direction='forward'

df = pd.merge_asof(df1, df2, on='Timestamp', direction='forward')
print (df)
            Timestamp  data  stuff
0 2019-04-02 11:00:01   111  101.0
1 2019-04-02 11:00:15   222  202.0
2 2019-04-02 11:00:29   333  404.0
3 2019-04-02 11:00:30   444  404.0
4 2019-04-02 11:00:31   505    NaN

#default direction='backward'
df = pd.merge_asof(df1, df2, on='Timestamp')
print (df)
            Timestamp  data  stuff
0 2019-04-02 11:00:01   111    NaN
1 2019-04-02 11:00:15   222  202.0
2 2019-04-02 11:00:29   333  303.0
3 2019-04-02 11:00:30   444  404.0
4 2019-04-02 11:00:31   505  404.0

答案 1 :(得分:0)

import pandas as pd
pd.merge(df1, df2, left_on=['Timestamp'], right_on=['Timestamp'])