pyspark中的paramGrid出错

时间:2018-05-04 12:20:18

标签: apache-spark pyspark pyspark-sql apache-spark-ml

我正在使用paramGrid来微调我的模型参数。以下是代码。

windowSize = 5
minCount = 10
vectorSize=300
maxIter= [10,100,1000]
regParam= [0.1,0.01]


paramGrid = ParamGridBuilder() \
    .addGrid(q1w2model.setWindowSize,windowSize) \
    .addGrid(q1w2model.setMinCount,minCount) \
    .addGrid(q2w2model.setWindowSize,windowSize) \
    .addGrid(q2w2model.setMinCount,minCount) \
     .addGrid(q1w2model.setVectorSize,vectorSize) \
    .addGrid(q2w2model.setVectorSize,vectorSize) \
    .addGrid(lr.setMaxIter,maxIter) \
    .addGrid(lr.setRegParam, regParam) \
    .build()


tvs = TrainValidationSplit(estimator=pipeline,
                          estimatorParamMaps=paramGrid,
                          evaluator=BinaryClassificationEvaluator(),
                          trainRatio=0.8)



model = tvs.fit(train) # model is the model with combination of parameters that performed best

以下是追溯电话:

  

文件“/home/PycharmProjects/untitled1/quora_feaures_pyspark.py”,第406行,       .addGrid(lr.setRegParam,regParam)\     文件“/usr/local/lib/python2.7/dist-packages/pyspark/ml/tuning.py”,第115行,在构建中       在itertools.product(* grid_values)中为prod返回[dict(zip(keys,prod))]   TypeError:'int'对象不可迭代

1 个答案:

答案 0 :(得分:1)

ParamGridBuilder.add_grid方法需要一种可迭代的数据类型,您将windowSizeminCountvectorSize作为整数传递到add_grid。您需要将这些变量更改为与其他网格搜索参数类似的列表,以解决错误。