熊猫基于时间条件加入两个表

时间:2016-09-27 09:28:00

标签: python pandas

我正在尝试根据时间戳加入Pandas中的两个表。基本上结构看起来像这样:

表2

Timestamp          Truck     MineX           MineY
2016-08-27 01:10    CT77    -11346.36655    -650404.405
2016-08-27 01:12    CT45    -11596.88137    -648294.056
2016-08-27 01:13    CT67    -11953.16118    -648325.114
2016-08-27 01:13    CT75    -11326.54075    -650447.462
2016-08-27 01:14    CT79    -11380.27834    -650425.968
2016-08-27 01:15    CT26    -9493.153286    -652313.633
2016-08-27 01:16    CT73    -11527.47602    -650210.723
2016-08-27 01:16    CT40    -11596.90867    -648260.214
2016-08-27 01:17    CT26    -9493.153286    -652313.633
2016-08-27 01:17    CT80    -11363.34558    -650385.959
2016-08-27 01:17    CT72    -11527.47355    -650213.8

表1

Truck   LoadLocation    Tonnes  ArriveTimestamp
CT70    338-001       261         2016-02-21 00:23
CT66    338-001       271         2016-02-21 00:31
CT62    338-001       264         2016-02-21 00:45
CT73    338-001       254         2016-02-21 00:54
CT71    338-001       250         2016-02-21 01:04
CT39    338-001       182.172     2016-02-21 01:11
CT62    338-001       285         2016-02-21 01:19
CT70    338-001       282         2016-02-21 01:25
CT73    338-001       250         2016-02-21 01:30
CT73    338-001       275         2016-02-21 01:35
CT64    338-001       253         2016-02-21 01:42

表1和表2需要连接,其中Timestamp和ArriveTimeStamp彼此相距一分钟,且Truck ID相同。如果没有匹配

,则抛出表2中的记录,首选左连接

2 个答案:

答案 0 :(得分:1)

您可以使用merge

df = pd.merge(df1, df2, on='Truck', how='left')
print (df)
   Truck LoadLocation   Tonnes     ArriveTimestamp           Timestamp  \
0   CT70      338-001  261.000 2016-02-21 00:23:00                 NaT   
1   CT66      338-001  271.000 2016-02-21 00:31:00                 NaT   
2   CT62      338-001  264.000 2016-02-21 00:45:00                 NaT   
3   CT73      338-001  254.000 2016-02-21 00:54:00 2016-08-27 01:16:00   
4   CT71      338-001  250.000 2016-02-21 01:04:00                 NaT   
5   CT39      338-001  182.172 2016-02-21 01:11:00                 NaT   
6   CT62      338-001  285.000 2016-02-21 01:19:00                 NaT   
7   CT70      338-001  282.000 2016-02-21 01:25:00                 NaT   
8   CT73      338-001  250.000 2016-02-21 01:30:00 2016-08-27 01:16:00   
9   CT73      338-001  275.000 2016-02-21 01:35:00 2016-08-27 01:16:00   
10  CT64      338-001  253.000 2016-02-21 01:42:00                 NaT   

          MineX       MineY  
0           NaN         NaN  
1           NaN         NaN  
2           NaN         NaN  
3  -11527.47602 -650210.723  
4           NaN         NaN  
5           NaN         NaN  
6           NaN         NaN  
7           NaN         NaN  
8  -11527.47602 -650210.723  
9  -11527.47602 -650210.723  
10          NaN         NaN  

boolean indexing,其中datetimes的过滤器绝对差异 - 样本返回空DataFrame

print ((df.Timestamp - df.ArriveTimestamp).astype('timedelta64[s]'))
0            NaN
1            NaN
2            NaN
3     16244520.0
4            NaN
5            NaN
6            NaN
7            NaN
8     16242360.0
9     16242060.0
10           NaN
dtype: float64

print ((df.Timestamp - df.ArriveTimestamp).astype('timedelta64[s]').abs() < 60)
0     False
1     False
2     False
3     False
4     False
5     False
6     False
7     False
8     False
9     False
10    False
dtype: bool
Empty DataFrame

print (df[(df.Timestamp - df.ArriveTimestamp).astype('timedelta64[s]').abs() < 60])
Empty DataFrame
Columns: [Truck, LoadLocation, Tonnes, ArriveTimestamp, Timestamp, MineX, MineY]
Index: []

答案 1 :(得分:0)

import pandas as pd
df=pd.merge(left=Table1,right=Table2, left_on=['TimeStamp1'], right_on=['TimeStamp2'])