应用错误收集

从熊猫系列的组中选择分钟

时间：2019-06-12 15:10:27

标签： python pandas group-by pandas-groupby

我有一个看起来像这样的熊猫系列

>>> print(x)
0     1
1     2
2     3
3     4
4     0
5     0
6     0
7     0
8     9
9     6
10    3
11    5
12    7
Name: c, dtype: int64

我想从每组不为零的数字中找到最小值，我可能没有解释这么大，所以我希望输出看起来像这样

>>> print(result)
0     1
1     1
2     1
3     1
4     0
5     0
6     0
7     0
8     3
9     3
10    3
11    3
12    3
Name: c, dtype: int64

2 个答案:

答案 0 :(得分：3)

使用shift ing cumsum技巧，然后调用GroupBy.transform：

u = x.eq(0)
x.groupby(u.ne(u.shift()).cumsum()).transform('min')

0     1
1     1
2     1
3     1
4     0
5     0
6     0
7     0
8     3
9     3
10    3
11    3
12    3
Name: 1, dtype: int64

答案 1 :(得分：3)

`for`和Numba

我想使用for循环，但可以通过Numba加快循环速度

是的：这是一个for循环，不是很漂亮
否：因为我使用Numba（-：

进口

import pandas as pd
import numpy as np
from numba import njit

定义功能

@njit
def f(x):
    y = []
    z = []
    for a in x:
        if not y:
            y.append(a)
            z.append(0)
        else:
            if (y[-1] == 0) ^ (a == 0):
                y.append(a)
                z.append(z[-1] + 1)
            else:
                y[-1] = min(y[-1], a)
                z.append(z[-1])
    return np.array(y)[np.array(z)]

使用功能

pd.Series(f(x.to_numpy()), x.index)

0     1
1     1
2     1
3     1
4     0
5     0
6     0
7     0
8     3
9     3
10    3
11    3
12    3
dtype: int64

`itertools.groupby`

Credit to room 6 for the assist.

from itertools import groupby, repeat

def repeat_min(x):
    for _, group in groupby(x, key=bool):
        group = list(group)
        minval = min(group)
        yield from repeat(minval, len(group))

pd.Series([*repeat_min(x)], x.index)

0     1
1     1
2     1
3     1
4     0
5     0
6     0
7     0
8     3
9     3
10    3
11    3
12    3
dtype: int64