如何解决错误“ AttributeError:'SparkSession'对象没有属性'serializer'?

时间:2018-11-15 19:54:20

标签: apache-spark pyspark pyspark-sql

我正在使用pyspark数据框。我有一些尝试将dataframe转换为rdd的代码,但是收到以下错误:

  

AttributeError:“ SparkSession”对象没有属性“ serializer”

可能是什么问题?

training, test = rescaledData.randomSplit([0.8, 0.2])
nb = NaiveBayes(smoothing=1.0, modelType="multinomial")
# Train a naive Bayes model.
model = nb.fit(rescaledData)

# Make prediction and test accuracy.
predictionAndLabel = test.rdd.map(lambda p: (model.predict(p.features), p.label))
accuracy = 1.0 * predictionAndLabel.filter(lambda pl: pl[0] == pl[1]).count() / test.count()
print('model accuracy {}'.format(accuracy))

有人对语句test.rdd为何导致错误有任何见解吗?数据框包含Row object of (label, features)

谢谢

1 个答案:

答案 0 :(得分:0)

Apologies as I don't have enough rep to comment. The answer to this question may resolve this, as this pertains to the way the SQL context is initiated:

https://stackoverflow.com/a/54738984/8534357

When I initiate the Spark Session and the SQL context, I was doing this, which is not right:

sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sc)

This problem was resolved by doing this instead:

sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)