假设:
d = {
'datetime': ['2010-01-08 09:45:00', '2010-01-08 10:00:00',
'2010-01-08 10:15:00', '2010-01-08 10:30:00',
'2010-01-08 10:45:00', '2010-01-08 11:00:00',
'2010-01-08 11:15:00', '2010-01-08 11:30:00',
'2010-01-08 11:45:00', '2010-01-08 12:00:00',
'2010-01-08 12:15:00', '2010-01-08 12:30:00',
'2010-01-08 12:45:00', '2010-01-08 13:00:00',
'2010-01-08 13:15:00', '2010-01-08 13:30:00',
'2010-01-08 13:45:00', '2010-01-08 14:00:00',
'2010-01-08 14:15:00', '2010-01-08 14:30:00',
'2010-01-08 14:45:00', '2010-01-08 15:00:00',
'2010-01-08 15:15:00', '2010-01-08 15:30:00',
'2010-01-08 15:45:00', '2010-01-08 16:00:00',
'2010-01-08 16:15:00'],
'Total-tops': [0,-1,-1,2,3,0,0,4,0,0,0,0,5,6,7,8,-1,0,0,0,0,0,0,0,-1,-1,2]
}
df = pandas.DataFrame(d)
df = df.set_index('datetime')
我想添加另一个列,它是一个布尔值,表示该行是否会中断。中断意味着顶部的数字大于1,然后在将来的某个地方出现-1。例如,前2个将在它遇到的下一个-1处中断。这是所需的数据框:
这是我目前使用的函数,但它运行速度非常慢,因为我遍历所有行。
def does_break(data):
cur_breaks = []
for index, row in data.iterrows():
if row['Total-tops'] > 1:
# Get all rows after this time that are new tops
breaks = data[(data['Total-tops'] == -1) & (data.index.time > index.time())]
if len(breaks) > 0:
cur_breaks.append(True)
else:
cur_breaks.append(False)
else:
cur_breaks.append(False)
return cur_breaks
答案 0 :(得分:1)
你可以使用笨拙的表达
In [56]: import numpy as np
In [57]: ((np.cumsum((df['Total-tops'] == -1)[:: -1])[:: -1] > 0) & (df['Total-tops'] > 0)).astype(int)
Out[57]:
datetime
2010-01-08 09:45:00 0
2010-01-08 10:00:00 0
2010-01-08 10:15:00 0
2010-01-08 10:30:00 1
2010-01-08 10:45:00 1
2010-01-08 11:00:00 0
2010-01-08 11:15:00 0
2010-01-08 11:30:00 1
2010-01-08 11:45:00 0
2010-01-08 12:00:00 0
2010-01-08 12:15:00 0
2010-01-08 12:30:00 0
2010-01-08 12:45:00 1
2010-01-08 13:00:00 1
2010-01-08 13:15:00 1
2010-01-08 13:30:00 1
2010-01-08 13:45:00 0
2010-01-08 14:00:00 0
2010-01-08 14:15:00 0
2010-01-08 14:30:00 0
2010-01-08 14:45:00 0
2010-01-08 15:00:00 0
2010-01-08 15:15:00 0
2010-01-08 15:30:00 0
2010-01-08 15:45:00 0
2010-01-08 16:00:00 0
2010-01-08 16:15:00 0
Name: Total-tops, dtype: int64
(当然,对于新专栏,您可以使用df['breaks'] = ...
。)
这样做如下:
cumsum
)都是真正的未来。答案 1 :(得分:1)
这个怎么样:
latest_break = df.index[(df['Total-tops'] == -1)].max()
df['break'] = 1
df['break'] = df['break'].where((df['Total-tops'] > 0) & (df.index < latest_break), 0)
对于在最近一次中断之前发生的所有正值,将break设置为1