如何在spark map函数中输出多个(键,值)

时间:2016-10-13 11:46:10

标签: scala apache-spark spark-dataframe

输入数据的格式如下:

public NotesFragment extends Fragment{

    private InputPassingInterface inpPassInterface;

    public static NotesFragment newInstance(){
        return new NotesFragment();
    }

    @Override
    public void onAttach(Context context) {
        super.onAttach(context);
        this.inpPassInterface = (InputPassingInterface) context;
    }


    @Override
    public View onCreateView(LayoutInflater inflater,
                             ViewGroup container, 
                             Bundle savedInstanceState) {
        return inflater.inflate(R.layout.fragment_notes, container, false);
    }

    @Override
    public View onViewCreated(View view, @Nullable Bundle b) {
        // Instead of onCreateView, 
        // do all of your view-updates from this method
        // for the sake of efficiency.
        // ... all your view initialization codes go here

        btDone.setOnClickListener(new View.OnClickListener() {
               @Override
               public void onClick(View view) {
                   if(inpPassInterface!=null)
                       inpPassInterface.passInput(
                           etNotes.getText().toString()
                       );
               }
        });


    }

}

输出格式如下():

+--------------------+-------------+--------------------+
|           StudentID|       Right |             Wrong  |
+--------------------+-------------+--------------------+
|       studentNo01  |       a,b,c |            x,y,z   |
+--------------------+-------------+--------------------+
|       studentNo02  |         c,d |              v,w   |
+--------------------+-------------+--------------------+

权利意味着1,错误意味着0。

我想使用Spark map函数或udf处理这些数据,但我不知道如何处理它。你能帮我吗?谢谢。

1 个答案:

答案 0 :(得分:3)

使用拆分和爆炸两次并执行联合

val df = List(
  ("studentNo01","a,b,c","x,y,z"),
  ("studentNo02","c,d","v,w")
  ).toDF("StudenID","Right","Wrong")

+-----------+-----+-----+
|   StudenID|Right|Wrong|
+-----------+-----+-----+
|studentNo01|a,b,c|x,y,z|
|studentNo02|  c,d|  v,w|
+-----------+-----+-----+


val pair = (
  df.select('StudenID,explode(split('Right,",")))
    .select(concat_ws(",",'StudenID,'col).as("key"))
    .withColumn("value",lit(1))
).unionAll(
  df.select('StudenID,explode(split('Wrong,",")))
    .select(concat_ws(",",'StudenID,'col).as("key"))
    .withColumn("value",lit(0))
)


+-------------+-----+
|          key|value|
+-------------+-----+
|studentNo01,a|    1|
|studentNo01,b|    1|
|studentNo01,c|    1|
|studentNo02,c|    1|
|studentNo02,d|    1|
|studentNo01,x|    0|
|studentNo01,y|    0|
|studentNo01,z|    0|
|studentNo02,v|    0|
|studentNo02,w|    0|
+-------------+-----+

您可以按如下方式转换为RDD

val rdd = pair.map(r => (r.getString(0),r.getInt(1)))