假设我的DataSet已经排序,看起来像这样:
[
[100.0, 1],
[105.0, 1],
[111.0, 1],
... so on
]
中位数等于105.0!所以在Java中我得到了包含数据的DataSet对象:
DataSet<Tuple2<Double, Integer>> data = ...
现在这是计算元组中第一个元素中位数的正确方法吗?:
public double getMedian() throws Exception{
DataSet<Tuple2<Double, Integer>> data = ...
List<Tuple2<Double, Integer>> dataList = data.collect();
double median = 0;
// calculate median
int itemCount = dataList.size();
if (itemCount % 2 == 0)
median = ((double) dataList.get(itemCount/2).f0 + (double) dataList.get(itemCount /2 - 1).f0)/2;
else
median = (double) dataList.get(itemCount/2).f0;
return median;
}
我的意思是这种方式在群集上运行此代码是一个好方法吗?