Question

我正在尝试做一些与this post非常相似的事情。除了我有死亡的结果，例如1-6我需要计算模具所有可能值的条纹。

import numpy as np
import pandas as pd

data = [5,4,3,6,6,3,5,1,6,6]
df = pd.DataFrame(data, columns = ["Outcome"])
df.head(n=10)

def f(x):

    x['c'] = (x['Outcome'] == 6).cumsum()
    x['a'] = (x['c'] == 1).astype(int)
    x['b'] = x.groupby( 'c' ).cumcount()

    x['streak'] = x.groupby( 'c' ).cumcount() + x['a']

    return x

df = df.groupby('Outcome', sort=False).apply(f)

print(df.head(n=10))

   Outcome  c  a  b  streak
0        5  0  0  0       0
1        4  0  0  0       0
2        3  0  0  0       0
3        6  1  1  0       1
4        6  2  0  0       0
5        3  0  0  1       1
6        5  0  0  1       1
7        1  0  0  0       0
8        6  3  0  0       0
9        6  4  0  0       0

我的问题是'c'没有表现。它应该在每次条纹断开时“重置”其计数器，否则a和b将不正确。

理想情况下，我希望像

一样优雅

def f(x):
    x['streak'] = x.groupby( (x['stat'] != 0).cumsum()).cumcount() + 
                  ( (x['stat'] != 0).cumsum() == 0).astype(int) 
    return x

如链接帖子所示。

Answer 1

如上所述，这是一个cumsum和cumcount的解决方案，但不是＆＃34;优雅＆＃34;如预期的那样（即不是单行）。

我首先标记连续值，给出＆＃34; block＆＃34;数：

In [326]: df['block'] = (df['Outcome'] != df['Outcome'].shift(1)).astype(int).cumsum()

In [327]: df
Out[327]: 
   Outcome  block
0        5      1
1        4      2
2        3      3
3        6      4
4        6      4
5        3      5
6        5      6
7        1      7
8        6      8
9        6      8

由于我现在知道何时出现重复值，我只需要为每个组递增计数：

In [328]: df['streak'] = df.groupby('block').cumcount()

In [329]: df
Out[329]: 
   Outcome  block  streak
0        5      1       0
1        4      2       0
2        3      3       0
3        6      4       0
4        6      4       1
5        3      5       0
6        5      6       0
7        1      7       0
8        6      8       0
9        6      8       1

如果你想从1开始计算，可以在最后一行添加+ 1。

熊猫条纹计数器

1 个答案: