我有一个Numpy 2D数组,其中的行是单独的时间序列,列对应于时间点。我想在每一行中拟合一条回归线以测量每个时间序列的趋势,我想我可以(像效率那样低)用一个循环来做到这一点:
array2D = ...
for row in array2D:
coeffs = sklearn.metrics.LinearRegression().fit( row, range( len( row ) ).coef_
...
有没有办法做到这一点而没有循环? coeffs
的最终形状是什么?
答案 0 :(得分:2)
对于像我这样的人,他们更喜欢范围为X,时间数据为y。
Assert.IsTrue(
XNode.DeepEquals(root,
XElement.Load(new StreamReader(new MemoryStream(bytes), encoding))));
PS:这种方法(带有矩阵乘法的线性回归)是大型数据集的金矿
答案 1 :(得分:1)
使线性回归误差最小的系数是
您可以使用numpy一次性解决所有行。
import numpy as np
from sklearn.linear_model import LinearRegression
def solve(timeseries):
n_samples = timeseries.shape[1]
# slope and offset/bias
n_features = 2
n_series = timeseries.shape[0]
# For a single time series, X would be of shape
# (n_samples, n_features) however in this case
# it will be (n_samples. n_features, n_series)
# The bias is added by having features being all 1's
X = np.ones((n_samples, n_features, n_series))
X[:, 1, :] = timeseries.T
y = np.arange(n_samples)
# A is the matrix to be inverted and will
# be of shape (n_series, n_features, n_features)
A = X.T @ X.transpose(2, 0, 1)
A_inv = np.linalg.inv(A)
# Do the other multiplications step by step
B = A_inv @ X.T
C = B @ y
# Return only the slopes (which is what .coef_ does in sklearn)
return C[:,1]
array2D = np.random.random((3,10))
coeffs_loop = np.empty(array2D.shape[0])
for i, row in enumerate(array2D):
coeffs = LinearRegression().fit( row[:,None], range( len( row ) )).coef_
coeffs_loop[i] = coeffs
coeffs_vectorized = solve(array2D)
print(np.allclose(coeffs_loop, coeffs_vectorized))