让我们有一个值为0或1的pandas DataFrame,例如:
import pandas as pd
a = pd.DataFrame([1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0,
1, 1, 1, 1, 1, 0, 0, 1, 1], columns=['instance'])
我计算1值的出现,重置计数为0时出现。例如:
count, b = 0, []
for i in a.instance:
if i == 0:
count = 0
b.append(count)
else:
count+=1
b.append(count)
给了我:
b = pd.DataFrame(b, columns=['count_check'])
c = pd.concat((a, b), axis=1)
结果:
instance count_check
0 1 1
1 1 2
2 1 3
3 0 0
4 0 0
5 0 0
6 1 1
7 1 2
8 1 3
9 1 4
10 0 0
11 1 1
12 1 2
13 1 3
14 1 4
15 1 5
16 0 0
17 0 0
18 1 1
19 1 2
它工作正常,但对于较大的数据集以及重复时它有点慢。是否会有更快更优雅的方式来做同样的事情?
感谢
答案 0 :(得分:1)
a['count_check'] = a.apply(lambda x: x.groupby((~x.astype(bool)).cumsum()).cumsum())
输出:
instance count_check
0 1 1
1 1 2
2 1 3
3 0 0
4 0 0
5 0 0
6 1 1
7 1 2
8 1 3
9 1 4
10 0 0
11 1 1
12 1 2
13 1 3
14 1 4
15 1 5
16 0 0
17 0 0
18 1 1
19 1 2