Question

我是spark的新手，在我们的项目中，我们使用Spark结构化的流来编写kafka消费者。我们有一个用例，我需要对代码进行模块化，以便多个人可以同时处理不同的工作。

第一步，我们阅读了不同的kafka主题，现在我有两个数据集。假设ds_input1和ds_input2。

我需要将这些信息传递给其他人正在从事的下一步。所以我在java8中做了如下操作

DriverClass{
   Dataset<Row> ds_input1 = //populate it from kafka topic

   Dataset<Row> ds_output1 = null;
   SecondPersonClass.process(ds_input1   , ds_output1 );

   //here outside I get ds_output1  as null
   //Why it is not working as List<Objects> in java ?
   //Is there anything wrong I am doing ? what is the correct way to do?

   Dataset<Row> ds_output2 = null;
   ThirdPersonClass.process(ds_output1 , ds_output2);


   //here outside I get ds_output2  as null
   //though ds_output2  populated inside function why it is still null outside?


}


SecondPersonClass{

 static void process(ds_input1  ,  ds_output1 ){
  //here have business logic to work on ds_input1  data.
  //then i will update and assign it back to out put dataSets
  //i.e. ds_output1 

  //for simplicity lets says as below
   ds_output1 = ds_input1  ;
  //here I see data in ds_output1 i.e ds_output1  is not null

}

}


ThirdPersonClass{

 static void process2(ds_input2  ,  ds_output2 ){
  //here have business logic to work on ds_input2  data.
  //  then i will update and assign it back to out put dataSets
  //i.e. ds_output2 

  //for simplicity lets says as below
   ds_output2 = ds_input2  ;
   //here I see data in ds_output2 i.e ds_output2  is not null

}

}

问题： 即使数据集填充在函数静态方法内部，为什么这些数据集未在函数外部反映而仍为null？为什么Java通过引用在这里不起作用的对象进行调用？如何处理呢？

如果可以的话，我们可以从一个函数返回多个数据集吗？

在spark中，数据集s可以作为输入args传递给函数，以得出函数的args吗？

0 个答案: