我有以下代码的PySpark作业InitiatorSpark.py
:
from pyspark.sql import SparkSession
spark = SparkSession \
.builder \
.appName("Test") \
.getOrCreate()
lines = (spark
.readStream
.format("org.apache.bahir.sql.streaming.mqtt.MQTTStreamSourceProvider")
.option("topic","my_topic")
.load("tcp://{}".format("127.0.0.1:1883")))
我将其运行如下:
spark-submit --jars lib/spark-sql-streaming-mqtt_2.11-2.2.1.jar InitiatorSpark.py
Spark启动,但随后在第.load("tcp://{}".format("127.0.0.1:1883")))
行失败,并显示以下消息:
Caused by: java.lang.ClassNotFoundException: org.eclipse.paho.client.mqttv3.MqttClientPersistence
尽管我提供了正确的JAR文件,但似乎找不到类MqttClientPersistence
。在lib
内部,有两个文件:
spark-streaming-mqtt_2.11-2.2.1-sources.jar
spark-streaming-mqtt_2.11-2.2.1.jar
我的设置有什么问题?
答案 0 :(得分:0)
我可以通过在spark-submit命令中添加3个JAR文件来运行此代码:
spark-submit --jars lib/spark-streaming-mqtt_2.11-2.2.1.jar,lib/spark-sql-streaming-mqtt_2.11-2.2.1.jar,lib/org.eclipse.paho.client.mqttv3-1.2.0.jar InitiatorSpark.py