我正在编写一个代码来在没有com.databricks实用程序的情况下在spark中展平XML。
我编写了一个将XML转换为List>的函数。稍后将其存储到RDD。示例代码如下所示
JavaPairRDD<Integer, List> ones= lines.mapToPair(new PairFunction<Tuple2<LongWritable,Text>, Integer, List>() {
private static final long serialVersionUID = 1L;
void PairFunction(){}
public Tuple2<Integer, List> call(
Tuple2<LongWritable, Text> t) throws Exception {
List<Map<String,String>> List_Vals = new ArrayList<Map<String,String>>();
Map<String,String> vals = new LinkedHashMap<String,String>();
vals.put("Sequence", "1234");
vals.put("customer", "1234");
temp++;
List_Vals.add(vals);
Map<String,String> val2 = new LinkedHashMap<String,String>();
val2.put("Sequence", "5678");
val2.put("customer", "ABCDE");
List_Vals.add(val2);
return new Tuple2<Integer, List>(temp, List_Vals);
}
});
如果我做ones.collect();我得到以下输出
[(1,[{Sequence=1234, customer=1234}, {Sequence=5678, customer=ABCDE}])
但我希望以
格式获取输出 SrN, Sequence, customer
1, 1234 , 1234
1, 5678 , ABCDE
我无法确定获得此输出的方法。请帮助