许多数组上的插值的矢量化和

时间:2016-05-23 14:33:48

标签: python numpy math scipy vectorization

我有一大组(数千条)平滑线(x,y对系列),x和y的采样不同,每条线的长度不同,即

x_0 = {x_00, x_01, ..., }  # length n_0
x_1 = {x_10, x_11, ..., }  # length n_1
...
x_m = {x_m0, x_m1, ..., }  # length n_m

y_0 = {y_00, y_01, ..., }  # length n_0
y_1 = {y_10, y_11, ..., }  # length n_1
...
y_m = {y_m0, y_m1, ..., }  # length n_m

我想找到插入到一组常规x点的每一行的累积属性,即x = {x_0, x_1 ..., x_n-1}

目前我for - 循环每一行,创建插值,重新采样,然后取总和/中位数/任何结果。它有效,但它真的很慢。 有没有办法对此操作进行矢量化/制作?

我在想,因为线性插值可以是矩阵运算,也许它是可能的。同时,由于每行可以有不同的长度...它可能很复杂。 编辑:但是对较短的数组进行零填充很容易......

我现在正在做的事情看起来像是,

import numpy as np
import scipy as sp
import scipy.interpolate

...

# `xx` and `yy` are lists of lists with the x and y points respectively
# `xref` are the reference x values at which I want interpolants
yref = np.zeros([len(xx), len(xref)])
for ii, (xi, yi) in enumerate(zip(xx, yy)):
    yref[ii] = sp.interp(xref, xi, yi)

y_med = np.median(yref, axis=-1)
y_sum = np.sum(yref, axis=-1)
...

1 个答案:

答案 0 :(得分:1)

希望您可以根据自己的需要调整以下内容。

我包含了pandas,因为它有一个插值功能来填充缺失值。

设置

import pandas as pd
import numpy as np

x = np.arange(19)
x_0 = x[::2]
x_1 = x[::3]

np.random.seed([3,1415])
y_0 = x_0 + np.random.randn(len(x_0)) * 2
y_1 = x_1 + np.random.randn(len(x_1)) * 2

xy_0 = pd.DataFrame(y_0, index=x_0)
xy_1 = pd.DataFrame(y_1, index=x_1)

注意:

  • x长度为19
  • x_0的长度为10
  • x_1长度为7

xy_0看起来像:

            0
0   -4.259448
2   -0.536932
4    0.059001
6    1.481890
8    7.301427
10   9.946090
12  12.632472
14  14.697564
16  17.430729
18  19.541526

xy_0可以通过x

reindex对齐
xy_0.reindex(x)

            0
0   -4.259448
1         NaN
2   -0.536932
3         NaN
4    0.059001
5         NaN
6    1.481890
7         NaN
8    7.301427
9         NaN
10   9.946090
11        NaN
12  12.632472
13        NaN
14  14.697564
15        NaN
16  17.430729
17        NaN
18  19.541526

然后我们可以用interpolate

填写遗漏
xy_0.reindex(x).interpolate()

            0
0   -4.259448
1   -2.398190
2   -0.536932
3   -0.238966
4    0.059001
5    0.770445
6    1.481890
7    4.391659
8    7.301427
9    8.623759
10   9.946090
11  11.289281
12  12.632472
13  13.665018
14  14.697564
15  16.064147
16  17.430729
17  18.486128
18  19.541526

xy_1

怎么样?
xy_1.reindex(x)

            0
0   -1.216416
1         NaN
2         NaN
3    3.704781
4         NaN
5         NaN
6    5.294958
7         NaN
8         NaN
9    8.168262
10        NaN
11        NaN
12  10.176849
13        NaN
14        NaN
15  14.714924
16        NaN
17        NaN
18  19.493678

插值

xy_0.reindex(x).interpolate()

            0
0   -1.216416
1    0.423983
2    2.064382
3    3.704781
4    4.234840
5    4.764899
6    5.294958
7    6.252726
8    7.210494
9    8.168262
10   8.837791
11   9.507320
12  10.176849
13  11.689541
14  13.202233
15  14.714924
16  16.307842
17  17.900760
18  19.493678