我有数据框,并希望检查B列中连续零值的最大计数。
示例输入和输出:
df = pd.DataFrame({'B':[1,3,4,0,0,11,1,15,0,0,0,87]})
df_out = pd.DataFrame({'max_count':[3]})
这怎么办?
答案 0 :(得分:11)
一种NumPy方式-
a = df['B'].values
m1 = np.r_[False, a==0, False]
idx = np.flatnonzero(m1[:-1] != m1[1:])
out = (idx[1::2]-idx[::2]).max()
分步运行-
# Input data as array
In [83]: a
Out[83]: array([ 1, 3, 4, 0, 0, 11, 1, 15, 0, 0, 0, 87])
# Mask of starts and ends for each island of 0s
In [193]: m1
Out[193]:
array([False, False, False, False, True, True, False, False, False,
True, True, True, False, False])
# Indices of those starts and ends
In [85]: idx
Out[85]: array([ 3, 5, 8, 11])
# Finally the differencing between starts and ends and max for final o/p
In [86]: out
Out[86]: 3
可以将其转换为单线:
np.diff(np.flatnonzero(np.diff(np.r_[0,a==0,0])).reshape(-1,2),axis=1).max()
答案 1 :(得分:5)
您可以为连续的行创建组
# create group for consecutive numbers
df['grp'] = (df['B'] != df['B'].shift()).cumsum()
B grp
0 1 1
1 3 2
2 4 3
3 0 4
4 0 4
5 11 5
6 1 6
7 15 7
8 0 8
9 0 8
10 0 8
11 87 9
# check size of groups having 0 value
max_count = df.query("B == 0").groupby('grp').size().max()
print(max_count)
3
答案 2 :(得分:3)
想法是为连续值的计数器创建具有累积和的掩码,仅过滤0
个值,以Series.value_counts
进行计数并获得最大值:
s = df['B'].ne(0)
a = s.cumsum()[~s].value_counts().max()
print (a)
3
df_out=pd.DataFrame({'max_count':[a]})
详细信息:
print (s.cumsum())
0 1
1 2
2 3
3 3
4 3
5 4
6 5
7 6
8 6
9 6
10 6
11 7
Name: B, dtype: int32
print (s.cumsum()[~s])
3 3
4 3
8 6
9 6
10 6
Name: B, dtype: int32
print (s.cumsum()[~s].value_counts())
6 3
3 2
Name: B, dtype: int64
答案 3 :(得分:1)
也许您可以将其调整为Python。在Java中,您可以使用以下代码找到最连续的0长度:
int B [] = {1,3,4,0,0,11,1,15,0,0,0,87}
int max_zeroes = 0;
int zeroes = 0;
for(int i = 0; i < B.length; i++) {
if( B[i] == 0) {
zeroes += 1;
if(zeroes > max_zeroes) {
max_zeroes = zeroes;
}
} else {
zeroes = 0;
}
}
如果您倾向于查找数组中大多数连续0的开始和结束索引,则可以使用以下逻辑:
int max_zeroes = 0;
int zeroes = 0;
int endIndex = -1;
for (int i = 0; i < B.length; i++) {
if (B[i] == 0) {
zeroes += 1;
if (zeroes > max_zeroes) {
max_zeroes = zeroes;
endIndex = i;
}
} else {
zeroes = 0;
}
}
int startIndex = endIndex;
for (int i = endIndex - 1; i > -1; i--) {
if(B[i] == 0) {
start = i;
} else {
i = -1; //used to get out of this for loop.
}
}
System.out.println("Max zeroes is: " + max_zeroes + " at start index " + start + " and end index: " + endIndex);
也许您可以将其调整为Python。