我有一种用于计算和显示数据框中大约行数的方法:
public static void countApprox(String csvPath, long timeout, double confidence) {
Dataset<Row> table = spark.read().csv(csvPath);
final PartialResult<BoundedDouble> result = table.javaRDD().countApprox(timeout, confidence);
System.out.println("init mean value: " + result.initialValue().mean()); //is always 0
System.out.println("init value high: " + result.initialValue().high()); //is always infinity
System.out.println("init value low: " + result.initialValue().low()); //is always 0
new Thread(new Runnable() {
public void run() {
System.out.println("calculating final values...");
long initTime = System.currentTimeMillis();
System.out.println("final value high: " + result.getFinalValue().high());
System.out.println("final value low: " + result.getFinalValue().low());
double timeTaken = (System.currentTimeMillis() - initTime)/1000.0;
System.out.println("time taken: " + timeTaken + " second(s)");
}
}).start();
}
问题是,无论我为超时和置信度提供了什么参数(超时值小于实际计数时间),initialValue低,均值和高值的结果始终相同:(0.0、0.0,无穷大) 。 找出我在做什么错的任何帮助将不胜感激。
预先感谢!