如何找到熊猫列中连续零的最大数量?

时间:2020-09-04 08:17:27

标签: python-3.x pandas numpy pandas-groupby

我有数据框,并希望检查B列中连续零值的最大计数。

示例输入和输出:

df = pd.DataFrame({'B':[1,3,4,0,0,11,1,15,0,0,0,87]})

df_out = pd.DataFrame({'max_count':[3]})

这怎么办?

4 个答案:

答案 0 :(得分:11)

一种NumPy方式-

a = df['B'].values
m1 = np.r_[False, a==0, False]
idx = np.flatnonzero(m1[:-1] != m1[1:])
out = (idx[1::2]-idx[::2]).max()

分步运行-

# Input data as array
In [83]: a
Out[83]: array([ 1,  3,  4,  0,  0, 11,  1, 15,  0,  0,  0, 87])

# Mask of starts and ends for each island of 0s
In [193]: m1
Out[193]: 
array([False, False, False, False,  True,  True, False, False, False,
        True,  True,  True, False, False])

# Indices of those starts and ends
In [85]: idx
Out[85]: array([ 3,  5,  8, 11])

# Finally the differencing between starts and ends and max for final o/p
In [86]: out
Out[86]: 3

可以将其转换为单线:

np.diff(np.flatnonzero(np.diff(np.r_[0,a==0,0])).reshape(-1,2),axis=1).max()

答案 1 :(得分:5)

您可以为连续的行创建组

# create group for consecutive numbers
df['grp'] = (df['B'] != df['B'].shift()).cumsum()

     B  grp
0    1    1
1    3    2
2    4    3
3    0    4
4    0    4
5   11    5
6    1    6
7   15    7
8    0    8
9    0    8
10   0    8
11  87    9


# check size of groups having 0 value
max_count = df.query("B == 0").groupby('grp').size().max()

print(max_count)
3

答案 2 :(得分:3)

想法是为连续值的计数器创建具有累积和的掩码,仅过滤0个值,以Series.value_counts进行计数并获得最大值:

s = df['B'].ne(0)

a = s.cumsum()[~s].value_counts().max()
print (a)
3

df_out=pd.DataFrame({'max_count':[a]})

详细信息

print (s.cumsum())
0     1
1     2
2     3
3     3
4     3
5     4
6     5
7     6
8     6
9     6
10    6
11    7
Name: B, dtype: int32

print (s.cumsum()[~s])
3     3
4     3
8     6
9     6
10    6
Name: B, dtype: int32

print (s.cumsum()[~s].value_counts())
6    3
3    2
Name: B, dtype: int64

答案 3 :(得分:1)

也许您可以将其调整为Python。在Java中,您可以使用以下代码找到最连续的0长度:

int B [] = {1,3,4,0,0,11,1,15,0,0,0,87}

int max_zeroes = 0;
int zeroes = 0;
for(int i = 0; i < B.length; i++) {
    if( B[i] == 0) {
        zeroes += 1;
        if(zeroes > max_zeroes) {
            max_zeroes = zeroes;
        }
    } else {
        zeroes = 0;
    }
}

如果您倾向于查找数组中大多数连续0的开始和结束索引,则可以使用以下逻辑:

int max_zeroes = 0;
int zeroes = 0;
int endIndex = -1;
for (int i = 0; i < B.length; i++) {
    if (B[i] == 0) {
        zeroes += 1;
        if (zeroes > max_zeroes) {
            max_zeroes = zeroes;
            endIndex = i;
        }
    } else {
        zeroes = 0;
    }
}

int startIndex = endIndex;
for (int i = endIndex - 1; i > -1; i--) {
    if(B[i] == 0) {
        start = i;
    } else {
        i = -1; //used to get out of this for loop.
    }
}

System.out.println("Max zeroes is: " + max_zeroes + " at start index " + start + " and end index: " + endIndex);

也许您可以将其调整为Python。