键入Mismatch for Reduce

时间:2015-06-29 09:52:39

标签: apache-spark reduce

我有一个RDD:

JavaRDD<Tuple2<Tuple2<String, Long>, Long>> mappedRdd = dataRDD
    .values().map(mapFunc);

我想在它上面运行一个reduce函数:

private static Function2<Tuple2<Tuple2<String, Long>, Long>, Tuple2<Tuple2<String, Long>, Long>, Tuple2<Tuple2<String, Long>, Long>> redFunc2 = new Function2<Tuple2<Tuple2<String, Long>, Long>, Tuple2<Tuple2<String, Long>, Long>, Tuple2<Tuple2<String, Long>, Long>>() {

@Override
public Tuple2<String, MetricDatum> call(
  Tuple2<Tuple2<String, Long>, Long> v1,
  Tuple2<Tuple2<String, Long>, Long> v2) throws Exception {
  long sum = 0L; // sum up the values
  sum += v1._2();
  sum += v2._2();

  String dimension = v1._1()._1();
  long timestamp = v1._1()._2();

  MetricDatum metricDatum = new MetricDatum();
  metricDatum.setMetricDimension(dimension);
  metricDatum.setTimestamp(timestamp);

  String key = metricDatum.getMetricDimension().toString();
  key += "_" + Long.toString(timestamp);
  metricDatum.setMetric(sum);
  return new Tuple2<>(key, metricDatum);
}

};

然而它给出了错误:

JavaRDD<Tuple2<Tuple2<String, Long>, Long>>  reducedGoraRdd = mappedRdd.reduce(redFunc);

我想通过Spark LogAnalytics.java

来做这个例子

我是否会错过任何内容,我应该使用flatMap等,还是减少功能是完全错误的?

1 个答案:

答案 0 :(得分:0)

基于LogAnalytics.java的reduce函数,我写了这样的:

//dummy 
class MetricDatum {
    public void setMetricDimension(String l) {}
    public void setTimestamp(Long l) {}
    public void setMetric(Long l) {}
    public Object getMetricDimension() {return new Object();}
}
//fake input
JavaRDD<Tuple2<Tuple2<String, Long>, Long>> mappedRdd = sc.emptyRDD();

//creating JavaPairRDD from JavaRDD of pairs
JavaPairRDD.fromJavaRDD(mappedRdd)
//reduce with commutative, associative function (Long, Long) -> Long
.reduceByKey(new Function2<Long, Long, Long>() {
    @Override
    public Long call(Long aLong, Long aLong2) throws Exception {
        return aLong + aLong2;
    }
})
//map (key, sum) pairs to (newKey, metricDatum(sum)) and creatring JavaPairRDD
.mapToPair(new PairFunction<Tuple2<Tuple2<String,Long>,Long>, String, MetricDatum>() {
    @Override
    public Tuple2<String, MetricDatum> 
            call(Tuple2<Tuple2<String, Long>, Long> tuple2LongTuple2) throws Exception {
        String dimension = tuple2LongTuple2._1()._1();
        long timestamp = tuple2LongTuple2._1()._2();

        MetricDatum metricDatum = new MetricDatum();
        metricDatum.setMetricDimension(dimension);
        metricDatum.setTimestamp(timestamp);

        String key = metricDatum.getMetricDimension().toString();
        key += "_" + Long.toString(timestamp);
        metricDatum.setMetric(tuple2LongTuple2._2());
        return new Tuple2<String, MetricDatum>(key, metricDatum);
    }
});