Pyspark:您似乎正在尝试从广播变量,操作或转换中引用SparkContext。使用Spark会话时出错

时间:2020-01-20 11:43:35

标签: pyspark

以下是我的实现,我正在尝试将数据从RDD转换为数据帧,但出现以下错误。

您似乎正在尝试从广播变量,操作或转换中引用SparkContext。

在我的实现中。现在,我已经了解到Spark上下文适用于工作程序,但是我也无法使其与Spark会话一起工作。

import json

from pyspark.sql import *
from pyspark.streaming import StreamingContext
from pyspark.streaming.kinesis import KinesisUtils, InitialPositionInStream

spark = SparkSession.builder.appName("myjob").getOrCreate()

def evaluate_stream(record):
    """

    :param record:
    :return:
    """
    data = json.loads(record.encode('utf8'))
    data_frame = spark.createDataFrame(Row(**x) for x in data).show(truncate=False)
    data_frame.show()


def printRecord(rdd):
    rdd_object = rdd.foreach(evaluate_stream)


if __name__ == "__main__":
    sc = spark.sparkContext
    batchIntervalSeconds = 5
    ssc = StreamingContext(sc, batchIntervalSeconds)
    consumer_app_name = "myjob"
    k_stream_name = 'my-stream'
    region_name = 'us-east-1'
    endpoint_URL = 'https://kinesis.us-east-1.amazonaws.com/'
    kinesisStream = KinesisUtils.createStream(ssc=ssc, kinesisAppName=consumer_app_name,
                                              streamName=k_stream_name, endpointUrl=endpoint_URL,
                                              regionName='us-east-1',
                                              initialPositionInStream=InitialPositionInStream.TRIM_HORIZON,
                                              checkpointInterval=5)
    kinesisStream.foreachRDD(printRecord)
    ssc.start()
    ssc.awaitTermination()

0 个答案:

没有答案