JavaRDD的总和值(Tuple3 <string,string,=“”double =“”>)

时间:2015-12-07 14:11:44

标签: apache-spark reduce rdd

我的JavaRDD包含Cassandra表的值

URL | Name | Value
A   |   x  |    1
A   |   x  |    2   
A   |   x  |    1.5
B   |   y  |    3
B   |   y  |    2.75
C   |   z  |    1.25
C   |   z  |    3 
C   |   z  |    1

所以我想减少这一点,只需要一个A,B,C并总结这些值。 我试过这样的话:

JavaPairRDD<Tuple3<String, String, Double>,Double> x = y.mapToPair(new PairFunction<Tuple3<String, String, Double>, Tuple3<String, String, Double>, Double>(){

        @Override
        public Tuple2<Tuple3<String, String, Double>, Double> call(
                Tuple3<String, String, Double> arg0) throws Exception {
            // TODO Auto-generated method stub
            return null;
        }

    }); // To Do reduce

是JavaRDD类型&gt; 但它说这不适用于论点。 是否有可能以这种方式解决它还是有更好的解决方案?

1 个答案:

答案 0 :(得分:1)

使用JavaRdd的reduceBykey函数,它将根据键减少数据并创建一个最终的Rdd。

试试此代码

JavaRDD<Tuple3<String, String, Double>> x = ...........;
        JavaPairRDD<Tuple2<String, String>, Double> result = x.mapToPair(
                new PairFunction<Tuple3<String, String, Double>, Tuple2<String, String>, Double>() {
                    @Override
                    public Tuple2<Tuple2<String, String>, Double> call(
                            Tuple3<String, String, Double> t)
                            throws Exception {
                        return new Tuple2<Tuple2<String, String>, Double>(
                                new Tuple2<String, String>(t._1(), t
                                        ._2()), t._3());
                    }
                }).reduceByKey(new Function2<Double, Double, Double>() {
            @Override
            public Double call(Double v1, Double v2) throws Exception {
                return v1 + v2;
            }
        });