根据分组变换数组的各个部分

时间:2019-10-09 15:44:36

标签: python pandas numpy

我有一个DataFrame,它的列由10值的组组成(按工作日编制索引)。下面给出了一个数组示例。

x = np.array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0.,0., 0., 0., 0., 0., 
              0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 
              1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 
              0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 
              0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1.])

我正在尝试将某些功能应用于每组1。在这种情况下,使1的每组根据某个指数函数衰减。我目前定义以下内容来实现这一目标。

# Find where the array switches from 1 to 0 and vice versa.
# prepend was used to maintain the original array size.
change = np.abs(np.diff(x, prepend=x[0]))

# split the array into groups of `1` and `0` values.
split = np.split(x, np.flatnonzero(change))

# transform groups of 1's to np.arange
_range = [np.arange(arr.size) if arr[0] == 1 else arr for arr in split]

# concatenate the transformed arrays
new_x = np.concatenate([np.exp(-arr) if arr[-1] != 0 else arr for arr in _range])

提供以下内容(使用.round(3)

array([1.   , 0.368, 0.135, 0.05 , 0.018, 0.007, 0.002, 0.001, 0.  , 0.   , 0.   , 0.   , 0.   , 0.  
       , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 1.   ,
       0.368, 0.135, 0.05 , 0.018, 0.007, 0.002, 0.001, 0.   , 0.   ,0.   , 0.   , 0.   , 0.   , 0.   
       , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,
       0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,0.   , 0.   , 0.   , 0.   , 0.   
       , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   ,
       0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   , 0.   
       , 1.   , 0.368, 0.135, 0.05 , 0.018])

是否有更好的方法使用pandas或numpy拆分应用组合DataFrame列,但为每个单独的值组/窗口“重新启动”某些功能?如上所述,数据是一个时间序列,因此10的每一组都定义了一个新的但相似的过程的开始和结束。

1 个答案:

答案 0 :(得分:2)

您在这里进入大熊猫:

s = pd.Series(x)
_range = s.groupby([s,s.ne(1).cumsum()]).cumcount()

# or 
# _range = s.groupby(s.ne(s.shift()).cumsum()).cumcount() * s



new_x = np.exp(-_range) * s

new_x.valuesnp.array

array([1.00000000e+00, 3.67879441e-01, 1.35335283e-01, 4.97870684e-02,
       1.83156389e-02, 6.73794700e-03, 2.47875218e-03, 9.11881966e-04,
       3.35462628e-04, 1.23409804e-04, 4.53999298e-05, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 1.00000000e+00, 3.67879441e-01,
       1.35335283e-01, 4.97870684e-02, 1.83156389e-02, 6.73794700e-03,
       2.47875218e-03, 9.11881966e-04, 3.35462628e-04, 1.23409804e-04,
       4.53999298e-05, 1.67017008e-05, 6.14421235e-06, 2.26032941e-06,
       8.31528719e-07, 3.05902321e-07, 1.12535175e-07, 4.13993772e-08,
       1.52299797e-08, 5.60279644e-09, 2.06115362e-09, 7.58256043e-10,
       2.78946809e-10, 1.02618796e-10, 3.77513454e-11, 1.38879439e-11,
       5.10908903e-12, 1.87952882e-12, 6.91440011e-13, 2.54366565e-13,
       9.35762297e-14, 3.44247711e-14, 1.26641655e-14, 4.65888615e-15,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
       0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.00000000e+00,
       3.67879441e-01, 1.35335283e-01, 4.97870684e-02, 1.83156389e-02])