我想计算下面显示的Dataframe中连续零的数量,请帮助
DEC JAN FEB MARCH APRIL MAY consecutive zeros
0 X X X 1 0 1 0
1 X X X 1 0 1 0
2 0 0 1 0 0 1 2
3 1 0 0 0 1 1 3
4 0 0 0 0 0 1 5
5 X 1 1 0 0 0 3
6 1 0 0 1 0 0 2
7 0 0 0 0 1 0 4
答案 0 :(得分:1)
这是我的两分钱......
将所有其他非零元素视为1
,那么您将拥有二进制代码。您现在需要做的就是找到“最大间隔”,其中0
没有位翻转。
我们可以编写一个函数并使用lambda
def len_consec_zeros(a):
a = np.array(list(a)) # convert elements to `str`
rr = np.argwhere(a == '0').ravel() # find out positions of `0`
if not rr.size: # if there are no zeros, return 0
return 0
full = np.arange(rr[0], rr[-1]+1) # get the range of spread of 0s
# get the indices where `0` was flipped to something else
diff = np.setdiff1d(full, rr)
if not diff.size: # if there are no bit flips, return the
return len(full) # size of the full range
# break the array into pieces wherever there's a bit flip
# and the result is the size of the largest chunk
pos, difs = full[0], []
for el in diff:
difs.append(el - pos)
pos = el + 1
difs.append(full[-1]+1 - pos)
# return size of the largest chunk
res = max(difs) if max(difs) != 1 else 0
return res
现在您已拥有此功能,请在每一行上调用它......
# join all columns to get a string column
# assuming you have your data in `df`
df['concated'] = df.astype(str).apply(lambda x: ''.join(x), axis=1)
df['consecutive_zeros'] = df.concated.apply(lambda x: len_consec_zeros(x))
答案 1 :(得分:0)
这是一种方法 -
# Inspired by https://stackoverflow.com/a/44385183/
def pos_neg_counts(mask):
idx = np.flatnonzero(mask[1:] != mask[:-1])
if len(idx)==0: # To handle all 0s or all 1s cases
if mask[0]:
return np.array([mask.size]), np.array([0])
else:
return np.array([0]), np.array([mask.size])
else:
count = np.r_[ [idx[0]+1], idx[1:] - idx[:-1], [mask.size-1-idx[-1]] ]
if mask[0]:
return count[::2], count[1::2] # True, False counts
else:
return count[1::2], count[::2] # True, False counts
def get_consecutive_zeros(df):
arr = df.values
mask = (arr==0) | (arr=='0')
zero_count = np.array([pos_neg_counts(i)[0].max() for i in mask])
zero_count[zero_count<2] = 0
return zero_count
示例运行 -
In [272]: df
Out[272]:
DEC JAN FEB MARCH APRIL MAY
0 X X X 1 0 1
1 X X X 1 0 1
2 0 0 1 0 0 1
3 1 0 0 0 1 1
4 0 0 0 0 0 1
5 X 1 1 0 0 0
6 1 0 0 1 0 0
7 0 0 0 0 1 0
In [273]: df['consecutive_zeros'] = get_consecutive_zeros(df)
In [274]: df
Out[274]:
DEC JAN FEB MARCH APRIL MAY consecutive_zeros
0 X X X 1 0 1 0
1 X X X 1 0 1 0
2 0 0 1 0 0 1 2
3 1 0 0 0 1 1 3
4 0 0 0 0 0 1 5
5 X 1 1 0 0 0 3
6 1 0 0 1 0 0 2
7 0 0 0 0 1 0 4
答案 2 :(得分:-1)
对于每一行,您希望cumsum(1-row)
在row == 1
时的每个点重置ts = pd.Series([0,0,0,0,1,1,0,0,1,1,1,0])
ts2 = 1-ts
tsgroup = ts.cumsum()
consec_0 = ts2.groupby(tsgroup).transform(pd.Series.cumsum)
consec_0.max()
。然后你取最大行
例如
{{1}}
会根据需要给你4个。
将其写入函数并应用于您的数据框