在基于参数的列中获得最小的时间差

时间:2018-07-09 15:58:30

标签: python pandas

这是我的可复制数据-

raw_data = {'file': [123, 342, 223, 134, 235,233], 
            'identity': [12, 12, 12, 12,14,14], 
            'line': [1, 2, 3, 4, 5,6], 
            'date': ['10/27/2013','10/27/2013', '10/27/2013', '10/27/2013', '10/20/2013','10/20/2013'],
            'time': ['13:20:00', '13:20:30', '13:21:00', '13:21:30', '15:40:00','15:40:30']}

现在对于给定的参数说'identity'=12 ,'date'=10/27/2013 and 'time'=13:20:21,我现在想创建一个新的数据框,它根据参数标识,日期从该数据框中选择与时间参数具有最小时间差的行。

例如对于参数'identity'=12 ,'date'=10/27/2013 and 'time'=13:20:21,我们有答案-

identity  date        time     difference
12       10/27/2013  13:20:30     9

1 个答案:

答案 0 :(得分:1)

由于您没有向我们提供尝试,因此代码的外观并不完全是。但这应该使您清楚地知道如何解决

from datetime import datetime
df = pd.DataFrame(raw_data)

cond = (df['identity'] == 12) 
cond2 = df['date'] == '10/27/2013'

td = datetime.strptime('13:20:21', '%H:%M:%S')

# series of time differnces
min_time_diff = abs(df.loc[cond & cond2]['time'].apply(lambda x: datetime.strptime(x, '%H:%M:%S') - td))

# return the row with the minimum time difference
out = df.loc[min_time_diff.idxmin()]

out['differce'] = min_time_diff[min_time_diff.idxmin()].components.seconds

OUT:

date        10/27/2013
file               342
identity            12
line                 2
time          13:20:30
differce             9
Name: 1, dtype: object