在sklearn网格搜索中获得不同的缩放比例

时间:2018-12-13 10:06:53

标签: python scikit-learn time-series pipeline grid-search

我正在尝试在sklearn中设置一个GridSearchCV,并将TimeSeriesSplit的数据标准化为 training 上的数据。我要做的是创建一个名为TransformerMixin的{​​{1}},该DivisorTransform获取规范化的除数并将其存储。 DivisorTransformPipeline之前被实例化。进入管道,我设置了DivisorTransform(以适应它),然后NormalizeTransformerDivisorTransform作为输入并执行除法。但是,使用进入GridSearchCV的管道可以腌制变压器。这将导致DivisorTransform被酸洗和装配,然后NormalizeTransformer被酸洗,但是本身具有DivisorTransformDivisorTransform被再次酸洗。这导致NormalizeTransformer使用不适合的DivisorTransform。 这是一个例子

dt = DivisorTransform()
pipe = Pipeline([('divisor',dt),('normalize',NormalizeTransformer(dt))])
gridS = GridSearchCV(pipe,params={...},cv=TimeSeriesSplit())

如何将不同的规范化管理到GridSearchCV中?哪些是最佳做法?

这是一个python示例

import pandas as pd
import numpy as np
from sklearn.base import BaseEstimator
from sklearn.base import TransformerMixin
from sklearn.pipeline import Pipeline
from sklearn.grid_search import GridSearchCV
from sklearn.model_selection import TimeSeriesSplit

class DivisorTransform(BaseEstimator,TransformerMixin):
    def __init__(self):
        pass

    def fit(self, X, y=None):
        print(f'{type(self).__name__} id {id(self)} fit')
        self.divisor_ = X.max()
        return self

    def transform(self, X):
        print(f'{type(self).__name__} id {id(self)} transform')
        return X

    def getDivisor(self):
        return self.divisor_

class NormalizationTransform(BaseEstimator,TransformerMixin):
    def __init__(self, divisorTransform, fakeParam):
        self.divTrns = divisorTransform
        self.fakeParam = fakeParam
        print(f'{type(self).__name__} id {id(self)} init saving {type(self.divTrns).__name__} at {id(self.divTrns)}')

    def fit(self, X, y=None):
        print(f'{type(self).__name__} id {id(self)} fit going to fit {type(self.divTrns).__name__} {id(self.divTrns)}')
        self.divisor_ = self.divTrns.fit(X).getDivisor()
        return self

    def transform(self, X):
        print(f'{type(self).__name__} id {id(self)} transform')
        res = X.copy()
        res = res / self.divisor_
        print('_______________________________________')
        print(res)
        return res

    def anti_transform(self, X):
        res = X.copy()
        res = res * self.divisor_
        return res

    def score(self, X, y=None, sample_weight=None):
        return 1


x = pd.DataFrame([[i+j*10 for j in range(3)] for i in range(10)],columns=['A','B','C'])
dvT = DivisorTransform()
print(type(dvT).__name__)
pipe = Pipeline([('divisor',dvT),('normalization',NormalizationTransform(dvT, 0))])
res1 = pipe.fit_transform(x)
params = {'normalization__fakeParam':[0,1]}
gs = GridSearchCV(pipe,params,cv=TimeSeriesSplit(n_splits=3).split(x))
print('Starting Grid Search')
gs.fit(x)

锡生产印刷品:

Starting Grid Search
NormalizationTransform id 140321510292896 init saving NoneType at 94405154462352
NormalizationTransform id 140321722266344 init saving NoneType at 94405154462352

这说明了问题

0 个答案:

没有答案