我正在尝试扩展我当前的模式以适应额外条件+ - 最后一个值的百分比而不是严格匹配之前的值。
data = np.array([[2,30],[2,900],[2,30],[2,30],[2,30],[2,1560],[2,30],
[2,300],[2,30],[2,450]])
df = pd.DataFrame(data)
df.columns = ['id','interval']
UPDATE 2(id fix):更新了数据2,包含更多数据:
data2 = np.array([[2,30],[2,900],[2,30],[2,29],[2,31],[2,30],[2,29],[2,31],[2,1560],[2,30],[2,300],[2,30],[2,450], [3,40],[3,900],[3,40],[3,39],[3,41], [3,40],[3,39],[3,41] ,[3,1560],[3,40],[3,300],[3,40],[3,450]])
df2 = pd.DataFrame(data2)
df2.columns = ['id','interval']
for i, g in df.groupby([(df.interval != df.interval.shift()).cumsum()]):
if len(g.interval.tolist())>=3:
print(g.interval.tolist())
会产生[30,30,30]
然而,我真的想要接近数字条件,说一个数字是 + - 10%之前的数字。
所以看df2我想收拾系列[30,29,31]
for i, g in df2.groupby([(df2.interval != <???+- 10% magic ???>).cumsum()]):
if len(g.interval.tolist())>=3:
print(g.interval.tolist())
更新:这是行结束处理代码,我将收集的列表存储到以ID作为键的字典中
leak_intervals = {}
final_leak_intervals = {}
serials = []
for i, g in df.groupby([(df.interval != df.interval.shift()).cumsum()]):
if len(g.interval.tolist()) >= 3:
print(g.interval.tolist())
serial = g.id.values[0]
if serial not in serials:
serials.append(serial)
if serial not in leak_intervals:
leak_intervals[serial] = g.interval.tolist()
else:
leak_intervals[serial] = leak_intervals[serial] + (g.interval.tolist())
答案 0 :(得分:1)
<强>更新强>
In [116]: df2.groupby(df2.interval.pct_change().abs().gt(0.1).cumsum()) \
.filter(lambda x: len(x) >= 3)
Out[116]:
id interval
2 2 30
3 2 29
4 2 31
5 2 30
6 2 29
7 2 31
15 3 40
16 3 39
17 2 41
18 2 40
19 2 39
20 2 41