我的输入看起来像下面的df。
我需要按列(A,B)分组并计算连续零的数量/计算每个组中连续零的长度,然后写入新列“ Zero_count”
Input:
A B DATE hour measure
A10 1 1/1/2014 0 0
A10 1 1/1/2014 1 0
A10 1 1/1/2014 2 0
A10 1 1/1/2014 3 0
A10 2 1/1/2014 4 0
A10 2 1/1/2014 5 1
A10 2 1/1/2014 6 2
A10 3 1/1/2014 7 0
A11 1 1/1/2014 8 0
A11 1 1/1/2014 9 0
A11 1 1/1/2014 10 2
A11 1 1/1/2014 11 0
A11 1 1/1/2014 12 0
A12 2 1/1/2014 13 1
A12 2 1/1/2014 14 3
A12 2 1/1/2014 15 0
A12 4 1/1/2014 16 5
A12 4 1/1/2014 17 0
A12 6 1/1/2014 18 0
我尝试使用“ groupby”技术来获取组,但是我一直在寻找组内连续的零计数。我尝试使用lambda函数,但是它计算零的总数,而我有兴趣重复连续的零。我希望我的输出看起来像这样:
Output
A B DATE hour measure Consec_zero_count
A10 1 1/1/2014 0 0 4
A10 1 1/1/2014 1 0 4
A10 1 1/1/2014 2 0 4
A10 1 1/1/2014 3 0 4
A10 2 1/1/2014 4 0 1
A10 2 1/1/2014 5 1 0
A10 2 1/1/2014 6 2 0
A10 3 1/1/2014 7 0 1
A11 1 1/1/2014 8 0 2
A11 1 1/1/2014 9 0 2
A11 1 1/1/2014 10 2 0
A11 1 1/1/2014 11 0 2
A11 1 1/1/2014 12 0 2
A12 2 1/1/2014 13 1 0
A12 2 1/1/2014 14 3 0
A12 2 1/1/2014 15 0 1
A12 4 1/1/2014 16 5 0
A12 4 1/1/2014 17 0 1
A12 6 1/1/2014 18 0 1
任何线索都将不胜感激。预先感谢!
答案 0 :(得分:2)
通过将ne
个值的shift
(Series
)与cumsum
进行比较,为连续值的唯一组创建助手!=
。然后groupby
与transform
和size
。仅0
和numpy.where
的最后拟合值:
g = df['measure'].ne(df['measure'].shift()).cumsum()
counts = df.groupby(['A','B', g])['measure'].transform('size')
df['Consec_zero_count'] = np.where(df['measure'].eq(0), counts, 0)
print (df)
A B DATE hour measure Consec_zero_count
0 A10 1 1/1/2014 0 0 4
1 A10 1 1/1/2014 1 0 4
2 A10 1 1/1/2014 2 0 4
3 A10 1 1/1/2014 3 0 4
4 A10 2 1/1/2014 4 0 1
5 A10 2 1/1/2014 5 1 0
6 A10 2 1/1/2014 6 2 0
7 A10 3 1/1/2014 7 0 1
8 A11 1 1/1/2014 8 0 2
9 A11 1 1/1/2014 9 0 2
10 A11 1 1/1/2014 10 2 0
11 A11 1 1/1/2014 11 0 2
12 A11 1 1/1/2014 12 0 2
13 A12 2 1/1/2014 13 1 0
14 A12 2 1/1/2014 14 3 0
15 A12 2 1/1/2014 15 0 1
16 A12 4 1/1/2014 16 5 0
17 A12 4 1/1/2014 17 0 1
18 A12 6 1/1/2014 18 0 1
答案 1 :(得分:0)
类似于@jezrael的答案,但逻辑略有不同:
df.loc[df.measure.eq(0), 'Consec_zero_count'] = (df.groupby(['A','B', df.measure.ne(0).cumsum()])
.measure.transform(lambda x: x[x.eq(0)].size))
df['Consec_zero_count'] = df['Consec_zero_count'].fillna(0).astype(int)
>>> df
A B DATE hour measure Consec_zero_count
0 A10 1 1/1/2014 0 0 4
1 A10 1 1/1/2014 1 0 4
2 A10 1 1/1/2014 2 0 4
3 A10 1 1/1/2014 3 0 4
4 A10 2 1/1/2014 4 0 1
5 A10 2 1/1/2014 5 1 0
6 A10 2 1/1/2014 6 2 0
7 A10 3 1/1/2014 7 0 1
8 A11 1 1/1/2014 8 0 2
9 A11 1 1/1/2014 9 0 2
10 A11 1 1/1/2014 10 2 0
11 A11 1 1/1/2014 11 0 2
12 A11 1 1/1/2014 12 0 2
13 A12 2 1/1/2014 13 1 0
14 A12 2 1/1/2014 14 3 0
15 A12 2 1/1/2014 15 0 1
16 A12 4 1/1/2014 16 5 0
17 A12 4 1/1/2014 17 0 1
18 A12 6 1/1/2014 18 0 1