我是新来的火花和mqtt。我正在尝试使用MQTTUtils代码,我在网上命名为wordcount.py
import sys
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.mqtt import MQTTUtils
if __name__ == "__main__":
if len(sys.argv) != 3:
print >> sys.stderr, "Usage: mqtt_wordcount.py <broker url> <topic>"
exit(-1)
sc = SparkContext(appName="PythonStreamingMQTTWordCount")
ssc = StreamingContext(sc, 1)
brokerUrl = sys.argv[1]
topic = sys.argv[2]
lines = MQTTUtils.createStream(ssc, brokerUrl, topic)
counts = lines.flatMap(lambda line: line.split(" ")) \
.map(lambda word: (word, 1)) \
.reduceByKey(lambda a, b: a+b)
counts.pprint()
ssc.start()
ssc.awaitTermination()
我按照说明安装了mosquitto代理(它正在工作),下载spark-streaming-mqtt-assembly_2.11-1.6.2.jar并使用以下命令运行python脚本: 〜$ spark-submit --jars spark-streaming-mqtt-assembly _ * .jar wordcount.py
但显示错误:
来自pyspark.streaming.mqtt导入MQTTUtils
ImportError:没有名为mqtt的模块
我错过了这里的任何东西吗? 谢谢
答案 0 :(得分:3)
对于spark版本2. *我们可以通过包含Bahir Jar在Structured Streaming中使用MQTT。
从pyspark连接到MQTT经纪人:
(spark
.readStream
.format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")
.option("topic","mytopic")
.load("tcp://{}".format(broker_uri)))