相当于Scala中的Java summaryStatistics

时间:2018-01-17 11:51:51

标签: java scala

我正在将一些Java代码移植到Scala,需要提取一些非常基本的统计值,包括计数最大值最小值来自长值流的平均值

在Java中,我用这种方法解决了这个问题:

public static Stats calcStats(Iterable<Ad> iterable) {
    LongSummaryStatistics longSummaryStatistics = StreamSupport.stream(iterable.spliterator(), false).mapToLong(Ad::getEvent_time).summaryStatistics();
    return new Stats(longSummaryStatistics.getMin(), longSummaryStatistics.getMax(), round(longSummaryStatistics.getAverage()),
            longSummaryStatistics.getCount());
}

是否有类似的方法在Scala库中一次性提取这些值(不使用像Spark这样的额外库)?

现在我正在使用与此类似的代码:

def main(args: Array[String]): Unit = {
  val l = List(("s1", 1L), ("s2", 2L), ("s3", 3L), ("s4", 4L))
  val stats = summaryStatistics(l.iterator)
  println("min: %d, max: %d, avg: %f".format(stats._1, stats._2, stats._3))
}

def summaryStatistics(iter: Iterator[(String, Long)]): (Long, Long, Double) = {
  val stats = iter.map((tuple: (String, Long)) => tuple._2)
    .foldLeft((Long.MaxValue, Long.MinValue, 0L, 0L))((a, t) => (Math.min(t, a._1), Math.max(t, a._2), a._3 + 1, a._4 + t))
  (stats._1, stats._2, stats._4 / (stats._3 * 1.0))
}

打印出来:

min: 1, max: 4, avg: 2.500000

2 个答案:

答案 0 :(得分:1)

你可以通过java世界直接使用java lib:)

import scala.collection.JavaConverters._

def main(args: Array[String]): Unit = {
    val l = List(("s1", 1L), ("s2", 2L), ("s3", 3L), ("s4", 4L))
    val stats = StreamSupport.stream(l.asJava.spliterator(), false).mapToLong(x => x._2).summaryStatistics()
    println("min: %d, max: %d, avg: %f".format(stats.getMin, stats.getMax, stats.getAverage))
}

请注意JavaConverters的导入,以及小的&#34; asJava&#34;在代码中添加以匹配StreamSupport API:)

答案 1 :(得分:1)

除C4stor外,您还可以使用更多这样的Scala集合:

<link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-beta.3/css/bootstrap.min.css" integrity="sha384-Zug+QiDoJOrZ5t4lssLdxGhVrurbmBWopoEl+M6BdEfwnCJZtKxi1KgxUyJq13dy" crossorigin="anonymous">

<style>
    .col-md-1{
        background: #fff;   
        background: linear-gradient(180deg, transparent, #353535, transparent);
        background-position: 65%;
        background-repeat: repeat-y;
        background-size: 1px auto;
        padding-top: 400px;
        margin-top: 45px;
        margin-left: -17px;
        height: 100%;
    }
</style>


<div class="container">
    <div class="row">
        <div class="col-md-4">
            google map
        </div>
        <div class="col-md-1 d-none d-md-block"></div>
        <div class="col-md-6">
            enquiry form
        </div>
    </div>
</div>

或者,如果您希望使用import java.util.LongSummaryStatistics def main(): Unit = { val l = List(("s1", 1L), ("s2", 2L), ("s3", 3L), ("s4", 4L)) // .view here is a trick to make it semantically more similar to Java Streams i.e. to avoid materializaiton of the mapped list val stats = summaryStatistics(l.view.map(_._2)) println("min: %d, max: %d, avg: %f".format(stats.getMin, stats.getMax, stats.getAverage)) } def summaryStatistics(col: TraversableOnce[Long]): LongSummaryStatistics = { col.foldLeft(new LongSummaryStatistics)((stat, el) => { stat.accept(el) stat }) } 中实施的并行支持潜力,则可以使用LongSummaryStatistics代替aggregate,例如:

foldLeft