我正在使用pyspark数据框。我有一些尝试将dataframe
转换为rdd
的代码,但是收到以下错误:
AttributeError:“ SparkSession”对象没有属性“ serializer”
可能是什么问题?
training, test = rescaledData.randomSplit([0.8, 0.2])
nb = NaiveBayes(smoothing=1.0, modelType="multinomial")
# Train a naive Bayes model.
model = nb.fit(rescaledData)
# Make prediction and test accuracy.
predictionAndLabel = test.rdd.map(lambda p: (model.predict(p.features), p.label))
accuracy = 1.0 * predictionAndLabel.filter(lambda pl: pl[0] == pl[1]).count() / test.count()
print('model accuracy {}'.format(accuracy))
有人对语句test.rdd
为何导致错误有任何见解吗?数据框包含Row object of (label, features)
。
谢谢
答案 0 :(得分:0)
Apologies as I don't have enough rep to comment. The answer to this question may resolve this, as this pertains to the way the SQL context is initiated:
https://stackoverflow.com/a/54738984/8534357
When I initiate the Spark Session and the SQL context, I was doing this, which is not right:
sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sc)
This problem was resolved by doing this instead:
sc = SparkSession.builder.appName('App Name').master("local[*]").getOrCreate()
sqlContext = SQLContext(sparkContext=sc.sparkContext, sparkSession=sc)