我有一个这样的数据框:
log_alerts
0 no
1 yes
2 yes
3 no
4 yes
5 yes
6 yes
7 yes
我试图在log_alerts列中找到连续的yes,第三次代码应该提示。
预期产出:
log_alerts message
0 no none
1 yes none
2 yes none
3 no none
4 yes none
5 yes none
6 yes continuity found
7 yes Review again
我怎样才能做到这一点?
可以使用pandas库完成吗?
答案 0 :(得分:0)
我相信你需要:
shift
ed列与cumsum
yes
行cumcount
获取每个群组的点数,并为Series
添加reindex
,其索引与原始DataFrame
相同numpy.select
a = df['log_alerts'].ne(df['log_alerts'].shift()).cumsum()
a = a[df['log_alerts'] == 'yes']
counts = a.groupby(a).cumcount().reindex(df.index, fill_value=0)
print (counts)
0 0
1 0
2 1
3 0
4 0
5 1
6 2
7 3
dtype: int64
masks = [counts == 2, counts > 2]
df['message'] = np.select(masks, ['continuity found','Review again'], default=None)
print (df)
log_alerts message
0 no None
1 yes None
2 yes None
3 no None
4 yes None
5 yes None
6 yes continuity found
7 yes Review again
<强>详情:
print (type(a))
<class 'pandas.core.series.Series'>
print (a)
1 2
2 2
4 4
5 4
6 4
7 4
Name: log_alerts, dtype: int32