部分拟合多变量SGDRegressor

时间:2014-04-07 14:18:55

标签: machine-learning scikit-learn regression

我目前正在尝试使用来自scikits的SGDRegressor学习解决大型数据集上的多变量目标问题,X~ =(10 ^ 6,10 ^ 4)。因此,我使用以下代码生成部分设计矩阵(X),其中每次迭代生成一批大小(10 ^ 3,10 ^ 4):

design = self.__iterX__(events)
reglins = [linear_model.SGDRegressor(fit_intercept=True) for i in range(nTargets)]

for X,times in design:
    for i in range(nTargets):
        reglins[i].partial_fit(X,y.ix[times].values[:,i])

但是我得到以下堆栈跟踪:

File ".../Enthought/Canopy_64bit/User/lib/python2.7/site-    packages/sklearn/linear_model/stochastic_gradient.py", line 841, in partial_fit
    coef_init=None, intercept_init=None)
File ".../Enthought/Canopy_64bit/User/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py", line 812, in _partial_fit
    sample_weight, n_iter)
File ".../Enthought/Canopy_64bit/User/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py", line 948, in _fit_regressor
    intercept_decay)
File "sgd_fast.pyx", line 508, in sklearn.linear_model.sgd_fast.plain_sgd (sklearn/linear_model/sgd_fast.c:8651)
    ValueError: floating-point under-/overflow occurred.

环顾四周似乎是因为没有正确地规范化X。我知道scikits learn有各种各样的功能,但是假设我在块中生成X,是否足以简单地规范每个块,或者我需要找到一种方法来一次标准化整列?

顺便说一句,是否有一个特殊的原因是partial_fit函数不允许多变量目标?

1 个答案:

答案 0 :(得分:3)

您可以适合一个街区并申请其他街区:

from sklearn import preprocessing
scaler = preprocessing.StandardScaler()
x1 = scalar.fit_transform(X_block_1)
xn = scalar.transform(X_block_n)

您可以选择其他规范化方法from this page