Question

我有一个Numpy 2D数组，其中的行是单独的时间序列，列对应于时间点。我想在每一行中拟合一条回归线以测量每个时间序列的趋势，我想我可以（像效率那样低）用一个循环来做到这一点：

array2D = ...
for row in array2D:
    coeffs = sklearn.metrics.LinearRegression().fit( row, range( len( row ) ).coef_
    ...

有没有办法做到这一点而没有循环？ coeffs的最终形状是什么？

Answer 1

对于像我这样的人，他们更喜欢范围为X，时间数据为y。

XNode.DeepEquals()

Assert.IsTrue(
    XNode.DeepEquals(root, 
                     XElement.Load(new StreamReader(new MemoryStream(bytes), encoding))));

PS：这种方法（带有矩阵乘法的线性回归）是大型数据集的金矿

Answer 2

使线性回归误差最小的系数是

您可以使用numpy一次性解决所有行。

import numpy as np
from sklearn.linear_model import LinearRegression

def solve(timeseries):

    n_samples = timeseries.shape[1]
    # slope and offset/bias
    n_features = 2
    n_series = timeseries.shape[0]

    # For a single time series, X would be of shape
    # (n_samples, n_features) however in this case
    # it will be (n_samples. n_features, n_series)
    # The bias is added by having features being all 1's
    X = np.ones((n_samples, n_features, n_series))
    X[:, 1, :] = timeseries.T

    y = np.arange(n_samples)

    # A is the matrix to be inverted and will
    # be of shape (n_series, n_features, n_features)
    A = X.T @ X.transpose(2, 0, 1)
    A_inv = np.linalg.inv(A) 

    # Do the other multiplications step by step
    B = A_inv @ X.T
    C = B @ y 

    # Return only the slopes (which is what .coef_ does in sklearn)
    return C[:,1]

array2D = np.random.random((3,10))
coeffs_loop = np.empty(array2D.shape[0])
for i, row in enumerate(array2D):
    coeffs = LinearRegression().fit( row[:,None], range( len( row ) )).coef_
    coeffs_loop[i] = coeffs

coeffs_vectorized = solve(array2D)

print(np.allclose(coeffs_loop, coeffs_vectorized))

用于2D数组的sklearn线性回归

2 个答案: