我们正在编写spark流应用程序,以使用createStream方法读取kafka消息,并且批处理间隔为180秒。
代码成功地工作并每180秒将文件创建到s3存储桶中,但文件中没有消息。下面是环境
火花2.3.0 Kakfa 1.0
请检查代码,并在此让我知道任何错误
#import dependencies
import findspark
findspark.init()
from pyspark import SparkContext
from pyspark.streaming import StreamingContext
from pyspark.streaming.kafka import KafkaUtils
import json
from pyspark.sql import *
sc = SparkContext(appName="SparkStreamingwithPython").getOrCreate()
sc.setLogLevel("WARN")
ssc = StreamingContext(sc,180)
topic="thirdtopic"
ZkQuorum = "localhost:2181"
kakfaStream = KafkaUtils.createStream(ssc,ZkQuorum,"Spark-Streaming-Consumer",{topic:1})
def WritetoS3(rdd):
rdd.saveAsTextFile("s3://BucketName/thirdtopic/SparkOut")
kakfaStream.foreachRDD(WritetoS3)
ssc.start()
ssc.awaitTermination()
谢谢。