表1有80万个条目
End_time DAY Exceed C_time stn max start_time
2019-12-26 12:29:34 PROD -41.9 21.1 501 21.1 2019-12-26 12:29:13
2019-12-26 12:30:59 PROD -10.3 52.7 501 52.7 2019-12-26 12:30:07
2019-12-26 12:32:36 PROD -35.8 27.2 503 27.2 2019-12-26 12:32:09
2019-12-26 12:33:54 PROD -53.3 9.7 504 9.7 2019-12-26 12:33:45
2019-12-26 12:35:04 PROD -24.6 38.4 505 38.4 2019-12-26 12:34:26
表2有30万个条目
AlarmMessage D_time Priority Station EquipID Active Quality LineName AlarmInTimeStamp
S501LH_B_RR_BT 2 1 501 2200505 True 192 BC1 2019-12-26 12:29:16.5608495
SHT_B_S503_BEAM 21 1 503 2300249 True 192 BC1 2019-12-26 12:32:20.0634165
S503LH_B_RR_T 2 1 503 2200505 True 192 BC1 2019-12-26 12:32:25.6494806
SHT_B_S504_ 21 1 504 2300256 True 192 BC1 2019-12-26 12:33:50.6719676
如果表2“ AlarmInTimeStamp”位于表1“ start_time”和“ End_time”之间,并且两个表“ station”相同,则应将它们合并 这样我就可以最终计算出时间戳和D_time之和
输出类似
End_time DAY Exceed C_time stn max start_time AlarmMessage D_time
2019-12-26 12:29:34 PROD -41.9 21.1 501 21.1 2019-12-26 12:29:13 S501LH_B_RR_BT 2
2019-12-26 12:30:59 PROD -10.3 52.7 501 52.7 2019-12-26 12:30:07 - -
2019-12-26 12:32:36 PROD -35.8 27.2 503 27.2 2019-12-26 12:32:09 SHT_B_S503_BEAM 21
S503 LH_B_RR_T 2
2019-12-26 12:33:54 PROD -53.3 9.7 504 9.7 2019-12-26 12:33:45 SHT_B_S504 21
2019-12-26 12:35:04 PROD -24.6 38.4 505 38.4 2019-12-26 12:34:26 - -
答案 0 :(得分:0)
您可以使用熊猫和一些矩阵乘法来解决该问题
import pandas as pd
# Attempt #5: Use python and the pandas package
# create the pandas Data Frames (kind of like R data.frame)
myDataDF = pd.DataFrame({'Record':range(1,6), 'SomeValue':[10, 8, 14, 6, 2]})
linkTableDF = pd.DataFrame({'ValueOfInterest':['a', 'b', 'c'], 'LowerBound': [1, 4, 10],
'UpperBound':[3, 5, 16]})
# set the index of the linkTable (kind of like setting row names)
linkTableDF = linkTableDF.set_index('ValueOfInterest')
# now apply a function to each row of the linkTable
# this function checks if any of the values in myData are between the upper
# and lower bound of a specific row thus returning 5 values (length of myData)
mask = linkTableDF.apply(lambda r: myDataDF.SomeValue.between(r['LowerBound'],
r['UpperBound']), axis=1)
# mask is a 3 (length of linkTable) by 5 matrix of True/False values
# by transposing it we get the row names (the ValueOfInterest) as the column names
mask = mask.T
# we can then matrix multiply mask with its column names
myDataDF['ValueOfInterest'] = mask.dot(mask.columns)
您可以使用哪种
mask = table.apply(lambda r: table2.AlarmInTimeStamp.between(r['start_time'],
r['End_time']), axis=1)
或者您也可以对表使用SQL
来源:https://www.mango-solutions.com/in-between-a-rock-and-a-conditional-join/