我有一个RDD:
JavaRDD<Tuple2<Tuple2<String, Long>, Long>> mappedRdd = dataRDD
.values().map(mapFunc);
我想在它上面运行一个reduce函数:
private static Function2<Tuple2<Tuple2<String, Long>, Long>, Tuple2<Tuple2<String, Long>, Long>, Tuple2<Tuple2<String, Long>, Long>> redFunc2 = new Function2<Tuple2<Tuple2<String, Long>, Long>, Tuple2<Tuple2<String, Long>, Long>, Tuple2<Tuple2<String, Long>, Long>>() {
@Override
public Tuple2<String, MetricDatum> call(
Tuple2<Tuple2<String, Long>, Long> v1,
Tuple2<Tuple2<String, Long>, Long> v2) throws Exception {
long sum = 0L; // sum up the values
sum += v1._2();
sum += v2._2();
String dimension = v1._1()._1();
long timestamp = v1._1()._2();
MetricDatum metricDatum = new MetricDatum();
metricDatum.setMetricDimension(dimension);
metricDatum.setTimestamp(timestamp);
String key = metricDatum.getMetricDimension().toString();
key += "_" + Long.toString(timestamp);
metricDatum.setMetric(sum);
return new Tuple2<>(key, metricDatum);
}
};
然而它给出了错误:
JavaRDD<Tuple2<Tuple2<String, Long>, Long>> reducedGoraRdd = mappedRdd.reduce(redFunc);
我想通过Spark LogAnalytics.java
来做这个例子我是否会错过任何内容,我应该使用flatMap等,还是减少功能是完全错误的?
答案 0 :(得分:0)
基于LogAnalytics.java
的reduce函数,我写了这样的:
//dummy
class MetricDatum {
public void setMetricDimension(String l) {}
public void setTimestamp(Long l) {}
public void setMetric(Long l) {}
public Object getMetricDimension() {return new Object();}
}
//fake input
JavaRDD<Tuple2<Tuple2<String, Long>, Long>> mappedRdd = sc.emptyRDD();
//creating JavaPairRDD from JavaRDD of pairs
JavaPairRDD.fromJavaRDD(mappedRdd)
//reduce with commutative, associative function (Long, Long) -> Long
.reduceByKey(new Function2<Long, Long, Long>() {
@Override
public Long call(Long aLong, Long aLong2) throws Exception {
return aLong + aLong2;
}
})
//map (key, sum) pairs to (newKey, metricDatum(sum)) and creatring JavaPairRDD
.mapToPair(new PairFunction<Tuple2<Tuple2<String,Long>,Long>, String, MetricDatum>() {
@Override
public Tuple2<String, MetricDatum>
call(Tuple2<Tuple2<String, Long>, Long> tuple2LongTuple2) throws Exception {
String dimension = tuple2LongTuple2._1()._1();
long timestamp = tuple2LongTuple2._1()._2();
MetricDatum metricDatum = new MetricDatum();
metricDatum.setMetricDimension(dimension);
metricDatum.setTimestamp(timestamp);
String key = metricDatum.getMetricDimension().toString();
key += "_" + Long.toString(timestamp);
metricDatum.setMetric(tuple2LongTuple2._2());
return new Tuple2<String, MetricDatum>(key, metricDatum);
}
});