这是我的可复制数据-
raw_data = {'file': [123, 342, 223, 134, 235,233],
'identity': [12, 12, 12, 12,14,14],
'line': [1, 2, 3, 4, 5,6],
'date': ['10/27/2013','10/27/2013', '10/27/2013', '10/27/2013', '10/20/2013','10/20/2013'],
'time': ['13:20:00', '13:20:30', '13:21:00', '13:21:30', '15:40:00','15:40:30']}
现在对于给定的参数说'identity'=12 ,'date'=10/27/2013 and 'time'=13:20:21
,我现在想创建一个新的数据框,它根据参数标识,日期从该数据框中选择与时间参数具有最小时间差的行。>
例如对于参数'identity'=12 ,'date'=10/27/2013 and 'time'=13:20:21
,我们有答案-
identity date time difference
12 10/27/2013 13:20:30 9
答案 0 :(得分:1)
由于您没有向我们提供尝试,因此代码的外观并不完全是。但这应该使您清楚地知道如何解决
from datetime import datetime
df = pd.DataFrame(raw_data)
cond = (df['identity'] == 12)
cond2 = df['date'] == '10/27/2013'
td = datetime.strptime('13:20:21', '%H:%M:%S')
# series of time differnces
min_time_diff = abs(df.loc[cond & cond2]['time'].apply(lambda x: datetime.strptime(x, '%H:%M:%S') - td))
# return the row with the minimum time difference
out = df.loc[min_time_diff.idxmin()]
out['differce'] = min_time_diff[min_time_diff.idxmin()].components.seconds
OUT:
date 10/27/2013
file 342
identity 12
line 2
time 13:20:30
differce 9
Name: 1, dtype: object