使用mqtt和pyspark流

时间:2016-09-06 12:29:20

标签: apache-spark spark-streaming mqtt

我是新来的火花和mqtt。我正在尝试使用MQTTUtils代码,我在网上命名为wordcount.py

import sys

from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.mqtt import MQTTUtils
if __name__ == "__main__":
    if len(sys.argv) != 3:
        print >> sys.stderr, "Usage: mqtt_wordcount.py <broker url> <topic>"
        exit(-1)

    sc = SparkContext(appName="PythonStreamingMQTTWordCount")
    ssc = StreamingContext(sc, 1)

    brokerUrl = sys.argv[1]
    topic = sys.argv[2]

    lines = MQTTUtils.createStream(ssc, brokerUrl, topic)
    counts = lines.flatMap(lambda line: line.split(" ")) \
        .map(lambda word: (word, 1)) \
        .reduceByKey(lambda a, b: a+b)
    counts.pprint()

    ssc.start()
    ssc.awaitTermination()

我按照说明安装了mosquitto代理(它正在工作),下载spark-streaming-mqtt-assembly_2.11-1.6.2.jar并使用以下命令运行python脚本: 〜$ spark-submit --jars spark-streaming-mqtt-assembly _ * .jar wordcount.py

但显示错误:

来自pyspark.streaming.mqtt导入MQTTUtils

ImportError:没有名为mqtt的模块

我错过了这里的任何东西吗? 谢谢

1 个答案:

答案 0 :(得分:3)

对于spark版本2. *我们可以通过包含Bahir Jar在Structured Streaming中使用MQTT。

从pyspark连接到MQTT经纪人:

(spark
    .readStream
    .format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")
    .option("topic","mytopic")
    .load("tcp://{}".format(broker_uri)))