我有一个DataFrame,它的列由1
和0
值的组组成(按工作日编制索引)。下面给出了一个数组示例。
x = np.array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0., 0., 0.,0., 0., 0., 0., 0.,
0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 0., 0., 0., 0.,
0., 0., 0., 0.,0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1.])
我正在尝试将某些功能应用于每组1
。在这种情况下,使1
的每组根据某个指数函数衰减。我目前定义以下内容来实现这一目标。
# Find where the array switches from 1 to 0 and vice versa.
# prepend was used to maintain the original array size.
change = np.abs(np.diff(x, prepend=x[0]))
# split the array into groups of `1` and `0` values.
split = np.split(x, np.flatnonzero(change))
# transform groups of 1's to np.arange
_range = [np.arange(arr.size) if arr[0] == 1 else arr for arr in split]
# concatenate the transformed arrays
new_x = np.concatenate([np.exp(-arr) if arr[-1] != 0 else arr for arr in _range])
提供以下内容(使用.round(3)
)
array([1. , 0.368, 0.135, 0.05 , 0.018, 0.007, 0.002, 0.001, 0. , 0. , 0. , 0. , 0. , 0.
, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 1. ,
0.368, 0.135, 0.05 , 0.018, 0.007, 0.002, 0.001, 0. , 0. ,0. , 0. , 0. , 0. , 0.
, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,0. , 0. , 0. , 0. , 0.
, 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. ,
0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0. , 0.
, 1. , 0.368, 0.135, 0.05 , 0.018])
是否有更好的方法使用pandas或numpy拆分应用组合DataFrame列,但为每个单独的值组/窗口“重新启动”某些功能?如上所述,数据是一个时间序列,因此1
或0
的每一组都定义了一个新的但相似的过程的开始和结束。
答案 0 :(得分:2)
您在这里进入大熊猫:
s = pd.Series(x)
_range = s.groupby([s,s.ne(1).cumsum()]).cumcount()
# or
# _range = s.groupby(s.ne(s.shift()).cumsum()).cumcount() * s
new_x = np.exp(-_range) * s
new_x.values
是np.array
:
array([1.00000000e+00, 3.67879441e-01, 1.35335283e-01, 4.97870684e-02,
1.83156389e-02, 6.73794700e-03, 2.47875218e-03, 9.11881966e-04,
3.35462628e-04, 1.23409804e-04, 4.53999298e-05, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 1.00000000e+00, 3.67879441e-01,
1.35335283e-01, 4.97870684e-02, 1.83156389e-02, 6.73794700e-03,
2.47875218e-03, 9.11881966e-04, 3.35462628e-04, 1.23409804e-04,
4.53999298e-05, 1.67017008e-05, 6.14421235e-06, 2.26032941e-06,
8.31528719e-07, 3.05902321e-07, 1.12535175e-07, 4.13993772e-08,
1.52299797e-08, 5.60279644e-09, 2.06115362e-09, 7.58256043e-10,
2.78946809e-10, 1.02618796e-10, 3.77513454e-11, 1.38879439e-11,
5.10908903e-12, 1.87952882e-12, 6.91440011e-13, 2.54366565e-13,
9.35762297e-14, 3.44247711e-14, 1.26641655e-14, 4.65888615e-15,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 0.00000000e+00,
0.00000000e+00, 0.00000000e+00, 0.00000000e+00, 1.00000000e+00,
3.67879441e-01, 1.35335283e-01, 4.97870684e-02, 1.83156389e-02])