我正在使用 pyspark.ml.classification 中的 MultilayerPerceptronClassifier 我的数据集具有11个功能
['fixed acidity',
'volatile acidity',
'citric acid',
'residual sugar',
'chlorides',
'free sulfur dioxide',
'total sulfur dioxide',
'density',
'pH',
'sulphates',
'alcohol']
和我的标签包含7个类。
-----+
|label|
+-----+
| 6|
| 3|
| 5|
| 9|
| 4|
| 8|
| 7|
+-----+
我在pyspark中使用MultiLayerPerceptronClassifier模型来训练我的数据集。根据pyspark ML约定,我以这种格式指定了我的神经网络架构
# specify layers for the neural network:
# input layer of size 11 (features), two intermediate of size 5 and 4
# and output of size 7 (classes)
layers = [11,5,4,7]
我正在指定分类器
clf = MultilayerPerceptronClassifier(labelCol='label',layers=layers)
现在,我正在使用火车数据进行训练
cvModel = clf.fit(train_data)
谁能告诉我为什么我收到此错误?
错误:
Py4JJavaError: An error occurred while calling o241.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 47.0 failed 1 times, most recent failure: Lost task 0.0 in stage 47.0 (TID 812, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: 8
at org.apache.spark.ml.classification.LabelConverter$.encodeLabeledPoint(MultilayerPerceptronClassifier.scala:121)
at org.apache.spark.ml.classification.MultilayerPerceptronClassifier$$anonfun$3.apply(MultilayerPerceptronClassifier.scala:238)
at org.apache.spark.ml.classification.MultilayerPerceptronClassifier$$anonfun$3.apply(MultilayerPerceptronClassifier.scala:238)