我正在使用paramGrid来微调我的模型参数。以下是代码。
windowSize = 5
minCount = 10
vectorSize=300
maxIter= [10,100,1000]
regParam= [0.1,0.01]
paramGrid = ParamGridBuilder() \
.addGrid(q1w2model.setWindowSize,windowSize) \
.addGrid(q1w2model.setMinCount,minCount) \
.addGrid(q2w2model.setWindowSize,windowSize) \
.addGrid(q2w2model.setMinCount,minCount) \
.addGrid(q1w2model.setVectorSize,vectorSize) \
.addGrid(q2w2model.setVectorSize,vectorSize) \
.addGrid(lr.setMaxIter,maxIter) \
.addGrid(lr.setRegParam, regParam) \
.build()
tvs = TrainValidationSplit(estimator=pipeline,
estimatorParamMaps=paramGrid,
evaluator=BinaryClassificationEvaluator(),
trainRatio=0.8)
model = tvs.fit(train) # model is the model with combination of parameters that performed best
以下是追溯电话:
文件“/home/PycharmProjects/untitled1/quora_feaures_pyspark.py”,第406行, .addGrid(lr.setRegParam,regParam)\ 文件“/usr/local/lib/python2.7/dist-packages/pyspark/ml/tuning.py”,第115行,在构建中 在itertools.product(* grid_values)中为prod返回[dict(zip(keys,prod))] TypeError:'int'对象不可迭代
答案 0 :(得分:1)
ParamGridBuilder.add_grid
方法需要一种可迭代的数据类型,您将windowSize
,minCount
和vectorSize
作为整数传递到add_grid
。您需要将这些变量更改为与其他网格搜索参数类似的列表,以解决错误。