在Spark结构化流中获取错误

时间:2019-01-07 08:11:16

标签: apache-spark apache-kafka spark-streaming

我正在尝试使用python使用Kafka在Spark结构化流上创建POC,下面是代码。

火花版本-2.3.2 卡夫卡-2.11-2.1.0 Hadoop-2.8.3

tell application "System Events"
    name of disk items of folder "Macintosh HD:Path:to:my:folder:"
end tell

在提交火花时低于错误。

  

。\ bin \ spark-submit --packages   org.apache.spark:spark-sql-kafka-0-10_2.11:2.3.2 spark_struct.py   本地主机:9092 tempre

spark = SparkSession \
    .builder \
    .appName("StructuredNetworkWordCount") \
    .getOrCreate()

brokers, topic = sys.argv[1:]
print("broker : {} and Topic : {}".format(brokers,topic))    

df = spark \
  .readStream \
  .format("kafka") \
  .option("kafka.bootstrap.servers", brokers) \
  .option("subscribe", topic) \
  .load()

numbericdf = df.selectExpr("CAST(key AS STRING)", "CAST(value AS STRING)")
numbericdf.createOrReplaceTempView("updates")
average  = spark.sql("select value from updates")
print(average)

query = average \
    .writeStream \
    .outputMode("append") \
    .format("console")\
    .start()

query.awaitTermination()

1 个答案:

答案 0 :(得分:0)

从Java 11移至Java 8后,此问题已解决。