在spark中,数据集s可以作为输入args传递给函数,以得出函数的args吗?

时间:2019-07-11 14:23:14

标签: java apache-spark

我是spark的新手,在我们的项目中,我们使用Spark结构化的流来编写kafka消费者。 我们有一个用例,我需要对代码进行模块化,以便多个人可以同时处理不同的工作。

第一步,我们阅读了不同的kafka主题,现在我有两个数据集。 假设ds_input1和ds_input2。

我需要将这些信息传递给其他人正在从事的下一步。 所以我在java8中做了如下操作

DriverClass{
   Dataset<Row> ds_input1 = //populate it from kafka topic

   Dataset<Row> ds_output1 = null;
   SecondPersonClass.process(ds_input1   , ds_output1 );

   //here outside I get ds_output1  as null
   //Why it is not working as List<Objects> in java ?
   //Is there anything wrong I am doing ? what is the correct way to do?

   Dataset<Row> ds_output2 = null;
   ThirdPersonClass.process(ds_output1 , ds_output2);


   //here outside I get ds_output2  as null
   //though ds_output2  populated inside function why it is still null outside?


}


SecondPersonClass{

 static void process(ds_input1  ,  ds_output1 ){
  //here have business logic to work on ds_input1  data.
  //then i will update and assign it back to out put dataSets
  //i.e. ds_output1 

  //for simplicity lets says as below
   ds_output1 = ds_input1  ;
  //here I see data in ds_output1 i.e ds_output1  is not null

}

}


ThirdPersonClass{

 static void process2(ds_input2  ,  ds_output2 ){
  //here have business logic to work on ds_input2  data.
  //  then i will update and assign it back to out put dataSets
  //i.e. ds_output2 

  //for simplicity lets says as below
   ds_output2 = ds_input2  ;
   //here I see data in ds_output2 i.e ds_output2  is not null

}

}

问题: 即使数据集填充在函数静态方法内部,为什么这些数据集未在函数外部反映而仍为null? 为什么Java通过引用在这里不起作用的对象进行调用? 如何处理呢?

如果可以的话,我们可以从一个函数返回多个数据集吗?

0 个答案:

没有答案