我的JavaRDD包含Cassandra表的值
URL | Name | Value
A | x | 1
A | x | 2
A | x | 1.5
B | y | 3
B | y | 2.75
C | z | 1.25
C | z | 3
C | z | 1
所以我想减少这一点,只需要一个A,B,C并总结这些值。 我试过这样的话:
JavaPairRDD<Tuple3<String, String, Double>,Double> x = y.mapToPair(new PairFunction<Tuple3<String, String, Double>, Tuple3<String, String, Double>, Double>(){
@Override
public Tuple2<Tuple3<String, String, Double>, Double> call(
Tuple3<String, String, Double> arg0) throws Exception {
// TODO Auto-generated method stub
return null;
}
}); // To Do reduce
是JavaRDD类型&gt; 但它说这不适用于论点。 是否有可能以这种方式解决它还是有更好的解决方案?
答案 0 :(得分:1)
使用JavaRdd的reduceBykey函数,它将根据键减少数据并创建一个最终的Rdd。
试试此代码
JavaRDD<Tuple3<String, String, Double>> x = ...........;
JavaPairRDD<Tuple2<String, String>, Double> result = x.mapToPair(
new PairFunction<Tuple3<String, String, Double>, Tuple2<String, String>, Double>() {
@Override
public Tuple2<Tuple2<String, String>, Double> call(
Tuple3<String, String, Double> t)
throws Exception {
return new Tuple2<Tuple2<String, String>, Double>(
new Tuple2<String, String>(t._1(), t
._2()), t._3());
}
}).reduceByKey(new Function2<Double, Double, Double>() {
@Override
public Double call(Double v1, Double v2) throws Exception {
return v1 + v2;
}
});