以下是我的实现,我正在尝试将数据从RDD转换为数据帧,但出现以下错误。
您似乎正在尝试从广播变量,操作或转换中引用SparkContext。
在我的实现中。现在,我已经了解到Spark上下文适用于工作程序,但是我也无法使其与Spark会话一起工作。
import json
from pyspark.sql import *
from pyspark.streaming import StreamingContext
from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream
spark = SparkSession.builder.appName("myjob").getOrCreate()
def evaluate_stream(record):
"""
:param record:
:return:
"""
data = json.loads(record.encode('utf8'))
data_frame = spark.createDataFrame(Row(**x) for x in data).show(truncate=False)
data_frame.show()
def printRecord(rdd):
rdd_object = rdd.foreach(evaluate_stream)
if __name__ == "__main__":
sc = spark.sparkContext
batchIntervalSeconds = 5
ssc = StreamingContext(sc, batchIntervalSeconds)
consumer_app_name = "myjob"
k_stream_name = 'my-stream'
region_name = 'us-east-1'
endpoint_URL = 'https://kinesis.us-east-1.amazonaws.com/'
kinesisStream = KinesisUtils.createStream(ssc=ssc, kinesisAppName=consumer_app_name,
streamName=k_stream_name, endpointUrl=endpoint_URL,
regionName='us-east-1',
initialPositionInStream=InitialPositionInStream.TRIM_HORIZON,
checkpointInterval=5)
kinesisStream.foreachRDD(printRecord)
ssc.start()
ssc.awaitTermination()