当我尝试调用score方法时,spark_sklearn包的GridSearchCV失败

时间:2018-02-02 07:32:33

标签: scikit-learn pyspark grid-search

我正在尝试使用GridSearchCV个包中的spark_sklearn而不是sklearn来利用spark

但是当我调用估算器的score方法时,它失败了。

我从http://scikit-learn.org/stable/auto_examples/plot_digits_pipe.html获取了示例代码。

代码如下所示:

def example_ppl():
    import numpy as np
    from sklearn import linear_model, decomposition, datasets
    from sklearn.pipeline import Pipeline
    # from sklearn.model_selection import GridSearchCV
    from spark_sklearn import GridSearchCV
    logistic = linear_model.LogisticRegression()

    pca = decomposition.PCA()
    pipe = Pipeline(steps=[('pca', pca), ('logistic', logistic)])

    digits = datasets.load_digits()
    X_digits = digits.data
    y_digits = digits.target

    n_components = [20, 40, 64]
    Cs = np.logspace(-4, 4, 3)
    # Create spark context
    spark_session =  SparkSession.builder.appName('test').getOrCreate()
    sc = spark_session.sparkContext

    estimator = GridSearchCV(sc,
                         estimator=pipe,
                         param_grid=dict(pca__n_components=n_components,
                              logistic__C=Cs))
    print(type(estimator))
    estimator.fit(X_digits, y_digits)
    # print(estimator.cv_results_)
    estimator.score(X_digits,y_digits) 

它抛出错误如下:

File "D:/Python_Project/test/sklearn_pyspark.py", line 72, in example_ppl
estimator.score(X_digits,y_digits)
File "D:\PyEnvs\test\lib\site-packages\sklearn\model_selection\_search.py", line 436, in score
score = self.scorer_[self.refit] if self.multimetric_ else self.scorer_
AttributeError: 'GridSearchCV' object has no attribute 'multimetric_'

这是spark_sklearn的问题,还是我在代码中遗漏了某些内容?

0 个答案:

没有答案