在分区循环中获取SparkSession

时间:2018-07-08 09:24:21

标签: apache-spark apache-spark-sql spark-streaming

我正在使用Java-Spark,我试图在SparkSession(rdd)中获得foreachPartition,我正在尝试的是:'

public static void main(String[] args) {

SparkConf conf = new SparkConf().setAppName("myappName").setMaster("mymaster");

Integer duration = 2000;

JavaStreamingContext ssc = new JavaStreamingContext(conf, new Duration(duration));

//Get Kafka read message as stream
JavaInputDSteram<ConsumerRecord<Object,String>> stream = KafkaUtils.createDirectStream(ssc, LocationStrategies.PreferConsistent(),
    ConsumerStrategies.Subscribe(inputTopics, inputKafkaParams));

    stream.foreachRdd(rdd -> {

        //my logic
        rdd.foreachPartition(iterator ->{
            SparkSession sparkSession = SparkSession.builder().config(conf).getOrCreate(); // null
            Dataset<String ds = sparkSession.createDataset(myJsonList /*ArrayList<String>*/, org.apache.spark.sql.Encoders.STRING());
        );


    );

 }

但以下行:SparkSession sparkSession = SparkSession.builder().config(conf).getOrCreate(); 返回始终为null, 如何在分区循环中获取SparkSession?我只想使用SparkSQL在JSON上做一些逻辑并获取DataSet等,

我该如何解决?

*也许我输入了代码

谢谢!

0 个答案:

没有答案