Question

假设我有以下这些系列。我想找到每个间隔的最大连续零计数的平均值。

s1 = pd.Series([1, 0, 2, 0, 0, 2, 0, 0, 0]) # [1, 2, 3]-> mean: 2
s2 = pd.Series([1, 1, 2]) # [0] -> 0
s3 = pd.Series([1, 0, 0, 1]) # [2] -> 2
s4 = pd.Series([0, 0, 1, 0, 0, 0]) # [2, 3] -> 2.5

我尝试使用.shift .cumsum .eq来解决问题，但找不到解决方法。任何帮助，将不胜感激。谢谢。

Answer 1

想法是通过Series.shift和Series.cumsum创建连续的组，仅过滤0，添加Series.value_counts，最后添加mean：

a = s.ne(s.shift()).cumsum()[s==0].value_counts().mean()

第二次返回Series缺少值，因此可以用技巧0替换为np.nan == np.nan的{{1}}：

False

一起：

a = a if a == a else 0

Answer 2

或使用itertools.groupby：

from itertools import groupby
import numpy as np

np.mean([sum(g) for k, g in groupby(s1.eq(0)) if k])

输出：

2.0

Answer 3

要计算序列中的平均零：

找到零出现无结果的唯一计数。我已经使用python的内置itertool.groupby函数来实现这一点。
在序列中找到零的总数。这可以通过内置的collections.Counter完成。
通过除以total_zero_count / unique_zero_count

平均值

您可以将所有内容整齐地包装在这样的类中

import itertools
from collections import Counter

class ComputeAvgZero:
    """Count avg zeros in the given sequence."""

    def __init__(self, series):
        self.series : pd.Series = series

    def compute_avg_zero(self):
        """Main method that computes the average."""

        unique_zeros = self._count_unique_zeros(self.series) 
        total_zeros = self._count_total_zeros(self.series)

        if unique_zeros:
            avg_zeros = total_zeros / unique_zeros
        else:
            avg_zeros = 0

        return avg_zeros


    @staticmethod
    def _count_unique_zeros(series:pd.Series) -> int:
        """Counting the times zero appears non consecutively."""

        # keeping only the first of the consequtive zeroes
        series = [i[0] for i in itertools.groupby(series)]

        # count the non consequtive occurances of zero
        unique_zero_count = Counter(series)[0]

        return unique_zero_count

    @staticmethod
    def _count_total_zeros(series:pd.Series) -> int:
        """Count all the zeroes."""

        total_zero_count = Counter(series)[0]

        return total_zero_count

您可以在此处查看正在运行的课程：

# compute average
s = pd.Series([0, 0, 1, 0, 0, 0])

obj = ComputeAvgZero(s)
avg_zeros = obj.compute_avg_zero()

print(avg_zeros)

这应该给你

2.5

熊猫：找出每个间隔的最大连续零个计数的平均值

3 个答案: