Question

我希望得到一个pandas系列中指定长度的所有重复子系列的摘要。我想知道是否有一种方法可以在pandas模块中找到这些信息。此外，我想要一种报告每个子系列频率的方法（也许是直方图？）。谢谢！

例如：

    series = 
    0    a
    1    b
    2    b
    3    b
    4    a
    5    b
    6    b
    7    a
    8    b
    9    a

    subseries_frequency(series, 3)

将返回：

    [a,b,b] = 2
    [b,b,b] = 1
    [b,b,a] = 2
    [b,a,b] = 2
    [a,b,a] = 1

Answer 1

这样做：

>>> from collections import Counter
>>> pred = lambda t: not t[-1] != t[-1] # predicate to drop the partial ones
>>> iter = (ts.shift(-j) for j in range(3)) 
>>> Counter(filter(pred, zip(*iter)))
Counter({('a', 'b', 'b'): 2, ('b', 'a', 'b'): 2, ('b', 'b', 'a'): 2, ('b', 'b', 'b'): 1, ('a', 'b', 'a'): 1})
>>> pd.Series(_)
a  b  a    1
      b    2
b  a  b    2
   b  a    2
      b    1
dtype: int64

可选地，

>>> iter = (ts.shift(-j) for j in range(3))
>>> cnt = pd.Series(list(zip(*iter)))
>>> cnt.iloc[:-2].value_counts()
(a, b, b)    2
(b, a, b)    2
(b, b, a)    2
(b, b, b)    1
(a, b, a)    1
dtype: int64

大熊猫系列中经常性子系列的摘要

1 个答案: