我正在尝试使用GridSearchCV
个包中的spark_sklearn
而不是sklearn
来利用spark
。
但是当我调用估算器的score
方法时,它失败了。
我从http://scikit-learn.org/stable/auto_examples/plot_digits_pipe.html获取了示例代码。
代码如下所示:
def example_ppl():
import numpy as np
from sklearn import linear_model, decomposition, datasets
from sklearn.pipeline import Pipeline
# from sklearn.model_selection import GridSearchCV
from spark_sklearn import GridSearchCV
logistic = linear_model.LogisticRegression()
pca = decomposition.PCA()
pipe = Pipeline(steps=[('pca', pca), ('logistic', logistic)])
digits = datasets.load_digits()
X_digits = digits.data
y_digits = digits.target
n_components = [20, 40, 64]
Cs = np.logspace(-4, 4, 3)
# Create spark context
spark_session = SparkSession.builder.appName('test').getOrCreate()
sc = spark_session.sparkContext
estimator = GridSearchCV(sc,
estimator=pipe,
param_grid=dict(pca__n_components=n_components,
logistic__C=Cs))
print(type(estimator))
estimator.fit(X_digits, y_digits)
# print(estimator.cv_results_)
estimator.score(X_digits,y_digits)
它抛出错误如下:
File "D:/Python_Project/test/sklearn_pyspark.py", line 72, in example_ppl
estimator.score(X_digits,y_digits)
File "D:\PyEnvs\test\lib\site-packages\sklearn\model_selection\_search.py", line 436, in score
score = self.scorer_[self.refit] if self.multimetric_ else self.scorer_
AttributeError: 'GridSearchCV' object has no attribute 'multimetric_'
这是spark_sklearn
的问题,还是我在代码中遗漏了某些内容?