如何获得行的nr直到下一个值

时间:2017-04-04 08:38:40

标签: python performance numpy

什么是获得nr行直到numpy中的下一个信号值的有效方法?

我有一个信号值列表(-1,nan,1),它看起来类似于下表,并且希望得到另一个列表,其中nr为行值直到下一个信号。考虑到消极和积极的价值观。

鉴于此表的第二列signal,我想生成第三列backward

+-------+--------+----------+
| index | signal | backward |
+-------+--------+----------+
|     0 |        |          |
|     1 |        |          |
|     2 |        |          |
|     3 |      1 |        4 |
|     4 |        |        3 |
|     5 |        |        2 |
|     6 |        |        1 |
|     7 |     -1 |       -3 |
|     8 |        |       -2 |
|     9 |        |       -1 |
|    10 |      1 |        3 |
|    11 |        |        2 |
|    12 |        |        1 |
|    13 |      1 |        5 |
|    14 |        |        4 |
|    15 |        |        3 |
|    16 |        |        2 |
|    17 |        |        1 |
|    18 |     -1 |       -3 |
|    19 |        |       -2 |
|    20 |        |       -1 |
|    21 |     -1 |       -5 |
|    22 |        |       -4 |
|    23 |        |       -3 |
|    24 |        |       -2 |
|    25 |        |       -1 |
|    26 |      1 |        4 |
|    27 |        |        3 |
|    28 |        |        2 |
|    29 |        |        1 |
+-------+--------+----------+

原始numpy的形状看起来像这样。请原谅我创建这个随机列表的方式,我不知道更好的方法:)这只是为了演示目的

import numpy as np
data = np.random.randint(-4, 4, (1000,)).astype(float)
data[data == -2] = 'nan'
data[data == -3] = 'nan'
data[data == -4] = 'nan'
data[data == 0] = 'nan'
data[data == 2] = 'nan'
data[data == 3] = 'nan'
print(data)

它的大小是几百万,所以它必须尽可能高效

2 个答案:

答案 0 :(得分:3)

这是一种基于累积求和的方法 -

def seq_descending(a):
    mask = ~np.isnan(a)
    idx = np.flatnonzero(mask)
    shift_idx = np.hstack((idx[1:] - idx[:-1], a.size - idx[-1] ))

    out = -np.ones(a.size, dtype=int)
    out[idx] = shift_idx-1
    idx0 = idx[0]

    out[:idx0] = 0
    out[idx0] += 1

    cumsums = out.cumsum()
    signs = np.repeat(a[idx].astype(int), shift_idx)
    cumsums[idx0:] *= signs

    return cumsums

示例运行 -

1)设置输入数组:

In [82]: a = np.full((30,), np.nan)
    ...: a[[3,7,10,13,18,21,26]] = [1,-1,1,1,-1,-1,1]
    ...: 

2)根据输入获取输出数组和堆栈以进行比较:

In [83]: np.column_stack((a, seq_descending(a) ))
Out[83]: 
array([[ nan,   0.],
       [ nan,   0.],
       [ nan,   0.],
       [  1.,   4.],
       [ nan,   3.],
       [ nan,   2.],
       [ nan,   1.],
       [ -1.,  -3.],
       [ nan,  -2.],
       [ nan,  -1.],
       [  1.,   3.],
       [ nan,   2.],
       [ nan,   1.],
       [  1.,   5.],
       [ nan,   4.],
       [ nan,   3.],
       [ nan,   2.],
       [ nan,   1.],
       [ -1.,  -3.],
       [ nan,  -2.],
       [ nan,  -1.],
       [ -1.,  -5.],
       [ nan,  -4.],
       [ nan,  -3.],
       [ nan,  -2.],
       [ nan,  -1.],
       [  1.,   4.],
       [ nan,   3.],
       [ nan,   2.],
       [ nan,   1.]])

答案 1 :(得分:1)

数据:

array([ nan,  -1.,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,  nan,
        nan,   1.,  -1.,  nan,  -1.,  -1.,  nan,  nan,  -1.])

您可以使用pandas

df = pd.DataFrame({'id':np.square(np.nan_to_num(data)).cumsum(),'signal':data})

df['backward'] = df.groupby('id')['id'].transform(lambda x: np.arange(1, len(x)+1)[::-1])

df['backward'] = df['backward']*df.signal.fillna(method='ffill')

>>> df
    id  signal  backward
0    0     NaN       NaN
1    1      -1       -11
2    1     NaN       -10
3    1     NaN        -9
4    1     NaN        -8
5    1     NaN        -7
6    1     NaN        -6
7    1     NaN        -5
8    1     NaN        -4
9    1     NaN        -3
10   1     NaN        -2
11   1     NaN        -1
12   2       1         1
13   3      -1        -2
14   3     NaN        -1
15   4      -1        -1
16   5      -1        -3
17   5     NaN        -2
18   5     NaN        -1
19   6      -1        -1