如何打印<string,array [] =“”>作为平面对?</string,>

时间:2015-03-20 19:40:51

标签: java apache-spark

设定:

我有关于客户及其最喜欢的十大电视节目的数据。到目前为止,我能够在JavaRDD<Tuple2<String, Shows[]>>中获取此数据。我能够打印它并检查它是否符合预期,它是。

目的:

现在,我需要按以下格式将此数据打印到文件中:

Customer_1 Fav_TV_Show_1
Customer_1 Fav_TV_Show_2
Customer_1 Fav_TV_Show_3
Customer_1 Fav_TV_Show_4
Customer_2 Fav_TV_Show_1
Customer_2 Fav_TV_Show_2
Customer_2 Fav_TV_Show_3
Customer_2 Fav_TV_Show_4
Customer_3 Fav_TV_Show_1
Customer_3 Fav_TV_Show_2
Customer_3 Fav_TV_Show_3
Customer_3 Fav_TV_Show_4

问题:

我不知道该怎么做。到目前为止,我已经尝试过这个:

// Need a flat pair back
JavaPairRDD<String, Shows> resultPairs = result.mapToPair(
        new PairFunction<Tuple2<String,Shows[]>, String, Shows>() {
            public Tuple2<String, Shows> call(Tuple2<String, Shows[]> t) {

                // But this won't work as I have to return multiple <Customer - Show> pairs
                }
            });
}

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:7)

嗯,有一点奇怪的是你有一个JavaRDD<Tuple2<String, Shows[]>>而不是JavaPairRDD<String, Shows[]>,在键值对的情况下使用它会更舒服。尽管如此,您可以执行以下操作以缩小结果:

// convert your RDD into a PairRDD format
JavaPairRDD<String, Shows[]> pairs = result.mapToPair(new PairFunction<Tuple2<String,Shows[]>, String, Shows[]>() {
    public Tuple2<String, Shows[]> call(Tuple2<String, Shows[]> t) throws Exception {
        return t;
    }
});

// now flatMap the values in order to split them with their respective keys
JavaPairRDD<String, Shows> output = pairs.flatMapValues(
    new Function<Shows[], Iterable<Shows>>() {
        public Iterable<Shows> call(Shows[] shows) throws Exception {
            return Arrays.asList(shows);
        }
});

// do something else with them
output.foreach(new VoidFunction<Tuple2<String, Shows>>() {
    public void call(Tuple2<String, Shows> t) throws Exception {
        System.out.println(t._1() + " " + t._2());
    }
});

或者,您也可以一步使用output获取flatMapToPair RDD,将Shows数组手动合并到Iterable中,如下所示:

JavaPairRDD<String, Shows> output = result.flatMapToPair(
    new PairFlatMapFunction<Tuple2<String, Shows[]>, String, Shows>() {
        public Iterable<Tuple2<String, Shows>> call(Tuple2<String, Shows[]> t) throws Exception {
            ArrayList<Tuple2<String, Shows>> ret = new ArrayList<>();
            for (Shows s : t._2())
                ret.add(new Tuple2<>(t._1(), s));
            return ret;
        }
    });

希望它有所帮助。干杯!