我有一个数据框:
df = pd.DataFrame({
'customerId' : ['A', 'A', 'A', 'A', 'A', 'A', 'A', 'B', 'B', 'B', 'B', 'B', 'B','B', 'B', 'B'],
'startOf15Min' : ['2019-07-30T00:00:00', '2019-07-30T00:15:00',
'2019-07-30T07:00:00', '2019-07-30T07:15:00',
'2019-07-30T07:30:00', '2019-07-30T07:45:00',
'2019-07-30T08:00:00', '2019-07-30T00:00:00',
'2019-07-30T00:15:00', '2019-07-30T06:30:00',
'2019-07-30T06:45:00', '2019-07-30T07:00:00',
'2019-07-30T07:15:00', '2019-07-30T07:30:00',
'2019-07-30T07:45:00', '2019-07-30T08:00:00']
}, columns=['customerId', 'startOf15Min'])
df.startOf15Min = pd.to_datetime(df.startOf15Min)
df
我需要找出两个日期时间之间缺少15分钟的间隔。 例如:
fr_timestamp = 2019-07-30 06:00:00
to_timestamp = 2019-07-30 09:00:00
对于客户A,缺少15分钟的时间间隔是: 06:00、06:15、06:30、06:45和08:15、08:30、08:45 。。 >
对于客户B,缺少15分钟的时间间隔是: 06:00、06:15和08:15、08:30、08:45 。
如何找到这些间隔?
致谢。
答案 0 :(得分:2)
intv = pd.date_range('2019-07-30 06:00:00','2019-07-30 09:00:00', freq='15Min', closed='left')
missing = df.groupby('customerId')['startOf15Min'].apply(lambda x: [i for i in intv if i not in x])
print(missing[0])
print(missing[1])
输出:
[Timestamp('2019-07-30 06:00:00', freq='15T'),
Timestamp('2019-07-30 06:15:00', freq='15T'),
Timestamp('2019-07-30 06:30:00', freq='15T'),
Timestamp('2019-07-30 06:45:00', freq='15T'),
Timestamp('2019-07-30 08:15:00', freq='15T'),
Timestamp('2019-07-30 08:30:00', freq='15T'),
Timestamp('2019-07-30 08:45:00', freq='15T')]
[Timestamp('2019-07-30 06:00:00', freq='15T'),
Timestamp('2019-07-30 06:15:00', freq='15T'),
Timestamp('2019-07-30 08:15:00', freq='15T'),
Timestamp('2019-07-30 08:30:00', freq='15T'),
Timestamp('2019-07-30 08:45:00', freq='15T')]