我是spark的新手,在我们的项目中,我们使用Spark结构化的流来编写kafka消费者。 我们有一个用例,我需要对代码进行模块化,以便多个人可以同时处理不同的工作。
第一步,我们阅读了不同的kafka主题,现在我有两个数据集。 假设ds_input1和ds_input2。
我需要将这些信息传递给其他人正在从事的下一步。 所以我在java8中做了如下操作
DriverClass{
Dataset<Row> ds_input1 = //populate it from kafka topic
Dataset<Row> ds_output1 = null;
SecondPersonClass.process(ds_input1 , ds_output1 );
//here outside I get ds_output1 as null
//Why it is not working as List<Objects> in java ?
//Is there anything wrong I am doing ? what is the correct way to do?
Dataset<Row> ds_output2 = null;
ThirdPersonClass.process(ds_output1 , ds_output2);
//here outside I get ds_output2 as null
//though ds_output2 populated inside function why it is still null outside?
}
SecondPersonClass{
static void process(ds_input1 , ds_output1 ){
//here have business logic to work on ds_input1 data.
//then i will update and assign it back to out put dataSets
//i.e. ds_output1
//for simplicity lets says as below
ds_output1 = ds_input1 ;
//here I see data in ds_output1 i.e ds_output1 is not null
}
}
ThirdPersonClass{
static void process2(ds_input2 , ds_output2 ){
//here have business logic to work on ds_input2 data.
// then i will update and assign it back to out put dataSets
//i.e. ds_output2
//for simplicity lets says as below
ds_output2 = ds_input2 ;
//here I see data in ds_output2 i.e ds_output2 is not null
}
}
问题: 即使数据集填充在函数静态方法内部,为什么这些数据集未在函数外部反映而仍为null? 为什么Java通过引用在这里不起作用的对象进行调用? 如何处理呢?
如果可以的话,我们可以从一个函数返回多个数据集吗?