Question

假设我有一个像这样的Pandas系列布尔值。

vals = pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1]).astype(bool)

>>> vals
0     False
1     False
2     False
3      True
4      True
5      True
6      True
7     False
8     False
9      True
10     True
11    False
12     True
13     True
14     True
dtype: bool

我想把这个布尔系列变成一个系列，其中每组1都被正确枚举，就像这样

如何有效地执行此操作？

我已经能够手动完成，在Python级别上循环遍历系列并递增，但这显然很慢。我正在寻找一个矢量化解决方案 - 我看到this answer from unutbu关于在NumPy中增加组的分裂，并试图让它与某种cumsum一起使用但到目前为止都没有成功。

Answer 1

你可以试试这个：

vals.astype(int).diff().fillna(vals.iloc[0]).eq(1).cumsum().where(vals, 0)

#0     0
#1     0
#2     0
#3     1
#4     1
#5     1
#6     1
#7     0
#8     0
#9     2
#10    2
#11    0
#12    3
#13    3
#14    3
#dtype: int64

Answer 2

这是一种NumPy方法 -

with open(File_Name, 'w') as csvfile:
   filewriter = csv.writer(csvfile,delimiter=',')
   filewriter.writerow(['Paul','Mary'])

使用样本和平铺多次输入 -

def island_same_label(vals):

    # Get array for faster processing with NumPy tools, ufuncs
    a = vals.values

    # Initialize output array
    out = np.zeros(a.size, dtype=int)

    # Get start indices for each island of 1s. Set those as 1s
    out[np.flatnonzero(a[1:] > a[:-1])+1] = 1

    # In case 1st element was True, we would have missed it earlier, so add that
    out[0] = a[0]

    # Finally cumsum and mask out non-island regions
    np.cumsum(out, out=out)
    return pd.Series(np.where(a, out, 0))

Answer 3

m=(vals.diff().ne(0)&vals.ne(0)).cumsum()
m[vals.eq(0)]=0
m
Out[235]: 
0     0
1     0
2     0
3     1
4     1
5     1
6     1
7     0
8     0
9     2
10    2
11    0
12    3
13    3
14    3
dtype: int32

数据输入

vals = pd.Series([0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1])

增加数组/系列中的连续正组

3 个答案: