假设我有以下这些系列。我想找到每个间隔的最大连续零计数的平均值。
s1 = pd.Series([1, 0, 2, 0, 0, 2, 0, 0, 0]) # [1, 2, 3]-> mean: 2
s2 = pd.Series([1, 1, 2]) # [0] -> 0
s3 = pd.Series([1, 0, 0, 1]) # [2] -> 2
s4 = pd.Series([0, 0, 1, 0, 0, 0]) # [2, 3] -> 2.5
我尝试使用.shift
.cumsum
.eq
来解决问题,但找不到解决方法。任何帮助,将不胜感激。谢谢。
答案 0 :(得分:2)
想法是通过Series.shift
和Series.cumsum
创建连续的组,仅过滤0
,添加Series.value_counts
,最后添加mean
:
a = s.ne(s.shift()).cumsum()[s==0].value_counts().mean()
第二次返回Series
缺少值,因此可以用技巧0
替换为np.nan == np.nan
的{{1}}:
False
一起:
a = a if a == a else 0
答案 1 :(得分:1)
或使用itertools.groupby
:
from itertools import groupby
import numpy as np
np.mean([sum(g) for k, g in groupby(s1.eq(0)) if k])
输出:
2.0
答案 2 :(得分:1)
要计算序列中的平均零:
itertool.groupby
函数来实现这一点。 collections.Counter
完成。total_zero_count / unique_zero_count
您可以将所有内容整齐地包装在这样的类中
import itertools
from collections import Counter
class ComputeAvgZero:
"""Count avg zeros in the given sequence."""
def __init__(self, series):
self.series : pd.Series = series
def compute_avg_zero(self):
"""Main method that computes the average."""
unique_zeros = self._count_unique_zeros(self.series)
total_zeros = self._count_total_zeros(self.series)
if unique_zeros:
avg_zeros = total_zeros / unique_zeros
else:
avg_zeros = 0
return avg_zeros
@staticmethod
def _count_unique_zeros(series:pd.Series) -> int:
"""Counting the times zero appears non consecutively."""
# keeping only the first of the consequtive zeroes
series = [i[0] for i in itertools.groupby(series)]
# count the non consequtive occurances of zero
unique_zero_count = Counter(series)[0]
return unique_zero_count
@staticmethod
def _count_total_zeros(series:pd.Series) -> int:
"""Count all the zeroes."""
total_zero_count = Counter(series)[0]
return total_zero_count
您可以在此处查看正在运行的课程:
# compute average
s = pd.Series([0, 0, 1, 0, 0, 0])
obj = ComputeAvgZero(s)
avg_zeros = obj.compute_avg_zero()
print(avg_zeros)
这应该给你
2.5