我想我可以通过一个例子更好地解释我想要实现的目标。假设我有这个数据帧:
time
0 2013-01-01 12:56:00
1 2013-01-01 12:00:12
2 2013-01-01 10:34:28
3 2013-01-01 09:34:54
4 2013-01-01 08:34:55
5 2013-01-01 16:35:19
6 2013-01-01 16:35:30
我想,给定一个间隔T,计算每行,有多少寄存器被打开"在那段时间里。例如,考虑到T = 2小时,这将是输出:
time count
0 2013-01-01 12:56:00 1 # 12:56-2 = 10:56 -> 1 register between [10:56, 12:56)
1 2013-01-01 12:00:12 1
2 2013-01-01 10:34:28 2 # 10:34:28-2 = 8:34:28 -> 2 registers between [8:34:28, 10:34:28)
3 2013-01-01 09:34:54 1
4 2013-01-01 08:34:55 0
5 2013-01-01 16:35:19 0
6 2013-01-01 16:35:30 1
我想知道如何使用pandas获得此结果。如果我只考虑dt.hour acessor,例如,对于T等于1,我可以创建每小时的列数,然后将它移动1,将count[i] + count[i-1]
的结果相加。但是,我不知道是否可以将其概括为所需的输出。
答案 0 :(得分:2)
这里的想法是将所有寄存器打开时间标记为+1,将所有寄存器关闭时间标记为-1。然后按时间排序并对+/- 1值执行累积和以使计数在给定时间打开。
let a, b, c;
a = b = c = 2+3;
结果输出:
# initialize interval start times as 1, end times as -1
start_times= df.assign(time=df['time'] - pd.Timedelta(hours=2), count=1)
all_times = start_times.append(df.assign(count=-1), ignore_index=True)
# sort by time and perform a cumulative sum get the count of overlaps at a given time
# (subtract 1 since you don't want to include the current value in the overlap)
all_times = all_times.sort_values(by='time')
all_times['count'] = all_times['count'].cumsum() - 1
# reassign to the original dataframe, keeping only the original times
df['count'] = all_times['count']