我正在尝试根据时间戳加入Pandas中的两个表。基本上结构看起来像这样:
表2
Timestamp Truck MineX MineY
2016-08-27 01:10 CT77 -11346.36655 -650404.405
2016-08-27 01:12 CT45 -11596.88137 -648294.056
2016-08-27 01:13 CT67 -11953.16118 -648325.114
2016-08-27 01:13 CT75 -11326.54075 -650447.462
2016-08-27 01:14 CT79 -11380.27834 -650425.968
2016-08-27 01:15 CT26 -9493.153286 -652313.633
2016-08-27 01:16 CT73 -11527.47602 -650210.723
2016-08-27 01:16 CT40 -11596.90867 -648260.214
2016-08-27 01:17 CT26 -9493.153286 -652313.633
2016-08-27 01:17 CT80 -11363.34558 -650385.959
2016-08-27 01:17 CT72 -11527.47355 -650213.8
表1
Truck LoadLocation Tonnes ArriveTimestamp
CT70 338-001 261 2016-02-21 00:23
CT66 338-001 271 2016-02-21 00:31
CT62 338-001 264 2016-02-21 00:45
CT73 338-001 254 2016-02-21 00:54
CT71 338-001 250 2016-02-21 01:04
CT39 338-001 182.172 2016-02-21 01:11
CT62 338-001 285 2016-02-21 01:19
CT70 338-001 282 2016-02-21 01:25
CT73 338-001 250 2016-02-21 01:30
CT73 338-001 275 2016-02-21 01:35
CT64 338-001 253 2016-02-21 01:42
表1和表2需要连接,其中Timestamp和ArriveTimeStamp彼此相距一分钟,且Truck ID相同。如果没有匹配
,则抛出表2中的记录,首选左连接答案 0 :(得分:1)
您可以使用merge
:
df = pd.merge(df1, df2, on='Truck', how='left')
print (df)
Truck LoadLocation Tonnes ArriveTimestamp Timestamp \
0 CT70 338-001 261.000 2016-02-21 00:23:00 NaT
1 CT66 338-001 271.000 2016-02-21 00:31:00 NaT
2 CT62 338-001 264.000 2016-02-21 00:45:00 NaT
3 CT73 338-001 254.000 2016-02-21 00:54:00 2016-08-27 01:16:00
4 CT71 338-001 250.000 2016-02-21 01:04:00 NaT
5 CT39 338-001 182.172 2016-02-21 01:11:00 NaT
6 CT62 338-001 285.000 2016-02-21 01:19:00 NaT
7 CT70 338-001 282.000 2016-02-21 01:25:00 NaT
8 CT73 338-001 250.000 2016-02-21 01:30:00 2016-08-27 01:16:00
9 CT73 338-001 275.000 2016-02-21 01:35:00 2016-08-27 01:16:00
10 CT64 338-001 253.000 2016-02-21 01:42:00 NaT
MineX MineY
0 NaN NaN
1 NaN NaN
2 NaN NaN
3 -11527.47602 -650210.723
4 NaN NaN
5 NaN NaN
6 NaN NaN
7 NaN NaN
8 -11527.47602 -650210.723
9 -11527.47602 -650210.723
10 NaN NaN
boolean indexing
,其中datetimes
的过滤器绝对差异 - 样本返回空DataFrame
:
print ((df.Timestamp - df.ArriveTimestamp).astype('timedelta64[s]'))
0 NaN
1 NaN
2 NaN
3 16244520.0
4 NaN
5 NaN
6 NaN
7 NaN
8 16242360.0
9 16242060.0
10 NaN
dtype: float64
print ((df.Timestamp - df.ArriveTimestamp).astype('timedelta64[s]').abs() < 60)
0 False
1 False
2 False
3 False
4 False
5 False
6 False
7 False
8 False
9 False
10 False
dtype: bool
Empty DataFrame
print (df[(df.Timestamp - df.ArriveTimestamp).astype('timedelta64[s]').abs() < 60])
Empty DataFrame
Columns: [Truck, LoadLocation, Tonnes, ArriveTimestamp, Timestamp, MineX, MineY]
Index: []
答案 1 :(得分:0)
import pandas as pd
df=pd.merge(left=Table1,right=Table2, left_on=['TimeStamp1'], right_on=['TimeStamp2'])