用于2D数组的sklearn线性回归

时间:2020-01-01 19:42:54

标签: python-3.x scikit-learn numpy-ndarray

我有一个Numpy 2D数组,其中的行是单独的时间序列,列对应于时间点。我想在每一行中拟合一条回归线以测量每个时间序列的趋势,我想我可以(像效率那样低)用一个循环来做到这一点:

array2D = ...
for row in array2D:
    coeffs = sklearn.metrics.LinearRegression().fit( row, range( len( row ) ).coef_
    ...

有没有办法做到这一点而没有循环? coeffs的最终形状是什么?

2 个答案:

答案 0 :(得分:2)

对于像我这样的人,他们更喜欢范围为X,时间数据为y。

XNode.DeepEquals()

Assert.IsTrue(
    XNode.DeepEquals(root, 
                     XElement.Load(new StreamReader(new MemoryStream(bytes), encoding))));

PS:这种方法(带有矩阵乘法的线性回归)是大型数据集的金矿

答案 1 :(得分:1)

使线性回归误差最小的系数是

enter image description here

您可以使用numpy一次性解决所有行。

import numpy as np
from sklearn.linear_model import LinearRegression

def solve(timeseries):

    n_samples = timeseries.shape[1]
    # slope and offset/bias
    n_features = 2
    n_series = timeseries.shape[0]

    # For a single time series, X would be of shape
    # (n_samples, n_features) however in this case
    # it will be (n_samples. n_features, n_series)
    # The bias is added by having features being all 1's
    X = np.ones((n_samples, n_features, n_series))
    X[:, 1, :] = timeseries.T

    y = np.arange(n_samples)

    # A is the matrix to be inverted and will
    # be of shape (n_series, n_features, n_features)
    A = X.T @ X.transpose(2, 0, 1)
    A_inv = np.linalg.inv(A) 

    # Do the other multiplications step by step
    B = A_inv @ X.T
    C = B @ y 

    # Return only the slopes (which is what .coef_ does in sklearn)
    return C[:,1]

array2D = np.random.random((3,10))
coeffs_loop = np.empty(array2D.shape[0])
for i, row in enumerate(array2D):
    coeffs = LinearRegression().fit( row[:,None], range( len( row ) )).coef_
    coeffs_loop[i] = coeffs

coeffs_vectorized = solve(array2D)

print(np.allclose(coeffs_loop, coeffs_vectorized))