Question

我正在使用 pyspark.ml.classification 中的 MultilayerPerceptronClassifier 我的数据集具有11个功能

['fixed acidity',
 'volatile acidity',
 'citric acid',
 'residual sugar',
 'chlorides',
 'free sulfur dioxide',
 'total sulfur dioxide',
 'density',
 'pH',
 'sulphates',
 'alcohol']

和我的标签包含7个类。

-----+
|label|
+-----+
|    6|
|    3|
|    5|
|    9|
|    4|
|    8|
|    7|
+-----+

我在pyspark中使用MultiLayerPerceptronClassifier模型来训练我的数据集。根据pyspark ML约定，我以这种格式指定了我的神经网络架构

# specify layers for the neural network:
# input layer of size 11 (features), two intermediate of size 5 and 4
# and output of size 7 (classes)
layers = [11,5,4,7]

我正在指定分类器

clf = MultilayerPerceptronClassifier(labelCol='label',layers=layers)

现在，我正在使用火车数据进行训练

cvModel = clf.fit(train_data)

谁能告诉我为什么我收到此错误？

错误：

Py4JJavaError: An error occurred while calling o241.fit.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 47.0 failed 1 times, most recent failure: Lost task 0.0 in stage 47.0 (TID 812, localhost, executor driver): java.lang.ArrayIndexOutOfBoundsException: 8
    at org.apache.spark.ml.classification.LabelConverter$.encodeLabeledPoint(MultilayerPerceptronClassifier.scala:121)
    at org.apache.spark.ml.classification.MultilayerPerceptronClassifier$$anonfun$3.apply(MultilayerPerceptronClassifier.scala:238)
    at org.apache.spark.ml.classification.MultilayerPerceptronClassifier$$anonfun$3.apply(MultilayerPerceptronClassifier.scala:238)

pyspark fit方法中的MLP分类器中的错误

0 个答案: