使用pyspark

时间:2017-07-09 20:57:51

标签: python apache-spark pyspark spark-streaming apache-spark-mllib

我想将Dstream转换为DataFrame,以便对此DataFrame应用相同的转换,并调用NaiveBayesModel模型来预测目标概率,我使用Apache Spark 2.1.1,Dstream是从socketTextStream构建的。我试图调用foreachRDD的{​​{1}}函数,但它无法正常工作。

Dstream

我收到以下错误消息

def predict(rdd):
    count = rdd.count()
    if(count>0):
        hashingTF = HashingTF(numFeatures=1000)
        features = hashingTF.transform(rdd)
        result = model.transform(features)
        return result.probability
    else:
        print("No data receveid")

model = NaiveBayesModel.load(sc, "ML_models/NaiveClassifier/naiveBayesClassifier-2010-09-10-08-51-25")
lines = ssc.socketTextStream("localhost", 9999)
tweets = lines.map(lambda v: json.loads(v))
text_dstream = tweets.map(lambda tweet: tweet['text'])
df = text_dstream.foreachRDD(lambda rdd: predict(rdd))
ssc.start()             # Start the computation
ssc.awaitTermination()

我的想法包括将AttributeError: 'RDD' object has no attribute '_jdf' 转换为Spark Dstream并使用以下方法应用转换:

DataFrame

0 个答案:

没有答案