熊猫:找出每个间隔的最大连续零个计数的平均值

时间:2020-02-05 07:29:33

标签: python pandas

假设我有以下这些系列。我想找到每个间隔的最大连续零计数的平均值。

s1 = pd.Series([1, 0, 2, 0, 0, 2, 0, 0, 0]) # [1, 2, 3]-> mean: 2
s2 = pd.Series([1, 1, 2]) # [0] -> 0
s3 = pd.Series([1, 0, 0, 1]) # [2] -> 2
s4 = pd.Series([0, 0, 1, 0, 0, 0]) # [2, 3] -> 2.5

我尝试使用.shift .cumsum .eq来解决问题,但找不到解决方法。任何帮助,将不胜感激。谢谢。

3 个答案:

答案 0 :(得分:2)

想法是通过Series.shiftSeries.cumsum创建连续的组,仅过滤0,添加Series.value_counts,最后添加mean

a = s.ne(s.shift()).cumsum()[s==0].value_counts().mean()

第二次返回Series缺少值,因此可以用技巧0替换为np.nan == np.nan的{​​{1}}:

False

一起:

a = a if a == a else 0

答案 1 :(得分:1)

或使用itertools.groupby

from itertools import groupby
import numpy as np

np.mean([sum(g) for k, g in groupby(s1.eq(0)) if k])

输出:

2.0

答案 2 :(得分:1)

要计算序列中的平均零:

  1. 找到出现无结果唯一计数。我已经使用python的内置itertool.groupby函数来实现这一点。
  2. 在序列中找到总数。这可以通过内置的collections.Counter完成。
  3. 通过除以total_zero_count / unique_zero_count
  4. 计算平均值

您可以将所有内容整齐地包装在这样的类中

import itertools
from collections import Counter

class ComputeAvgZero:
    """Count avg zeros in the given sequence."""

    def __init__(self, series):
        self.series : pd.Series = series

    def compute_avg_zero(self):
        """Main method that computes the average."""

        unique_zeros = self._count_unique_zeros(self.series) 
        total_zeros = self._count_total_zeros(self.series)

        if unique_zeros:
            avg_zeros = total_zeros / unique_zeros
        else:
            avg_zeros = 0

        return avg_zeros


    @staticmethod
    def _count_unique_zeros(series:pd.Series) -> int:
        """Counting the times zero appears non consecutively."""

        # keeping only the first of the consequtive zeroes
        series = [i[0] for i in itertools.groupby(series)]

        # count the non consequtive occurances of zero
        unique_zero_count = Counter(series)[0]

        return unique_zero_count

    @staticmethod
    def _count_total_zeros(series:pd.Series) -> int:
        """Count all the zeroes."""

        total_zero_count = Counter(series)[0]

        return total_zero_count

您可以在此处查看正在运行的课程:

# compute average
s = pd.Series([0, 0, 1, 0, 0, 0])

obj = ComputeAvgZero(s)
avg_zeros = obj.compute_avg_zero()

print(avg_zeros)

这应该给你

2.5