我正在使用Java-Spark,我试图在SparkSession
(rdd)中获得foreachPartition
,我正在尝试的是:'
public static void main(String[] args) {
SparkConf conf = new SparkConf().setAppName("myappName").setMaster("mymaster");
Integer duration = 2000;
JavaStreamingContext ssc = new JavaStreamingContext(conf, new Duration(duration));
//Get Kafka read message as stream
JavaInputDSteram<ConsumerRecord<Object,String>> stream = KafkaUtils.createDirectStream(ssc, LocationStrategies.PreferConsistent(),
ConsumerStrategies.Subscribe(inputTopics, inputKafkaParams));
stream.foreachRdd(rdd -> {
//my logic
rdd.foreachPartition(iterator ->{
SparkSession sparkSession = SparkSession.builder().config(conf).getOrCreate(); // null
Dataset<String ds = sparkSession.createDataset(myJsonList /*ArrayList<String>*/, org.apache.spark.sql.Encoders.STRING());
);
);
}
但以下行:SparkSession sparkSession = SparkSession.builder().config(conf).getOrCreate();
返回始终为null,
如何在分区循环中获取SparkSession?我只想使用SparkSQL在JSON上做一些逻辑并获取DataSet等,
我该如何解决?
*也许我输入了代码
谢谢!