如何根据每个不同的指标(名称)在此示例中优雅地计算每组scala中的摘要统计信息(例如均值方差)?
case class MeasureUnit(name: String, value: Double)
Seq(MeasureUnit("metric1", 0.04), MeasureUnit("metric1", 0.09),
MeasureUnit("metric2", 0.64), MeasureUnit("metric2", 0.34), MeasureUnit("metric2", 0.84))
如何计算每个属性的均值/方差的一个很好的例子是https://chrisbissell.wordpress.com/2011/05/23/a-simple-but-very-flexible-statistics-library-in-scala/ 但这并不包括分组。
答案 0 :(得分:2)
您可以使用Seq#groupBy
val measureSeq : Seq[MeasureUnit] = ???
type Name = String
// "metric1" -> Seq(0.04, 0.09), "metric2" -> Seq(0.64, 0.34, 0.84)
val groupedMeasures : Map[Name, Seq[Double]] =
measureSeq
.groupBy(_.name)
.mapValues(_ map (_.value))
然后可以使用分组来计算汇总统计信息:
type Mean = Double
val meanMapping : Map[Name, Mean] =
groupedMeasures mapValues { v => mean(v) }
type Variance = Double
val varianceMapping : Map[Name, Variance] =
groupedMeasures mapValues { v => variance(v) }
或者您可以将每个名称映射到统计元组:
type Summary = Tuple2[Mean, Variance]
val summaryMapping : Map[Name, Summary] =
groupedMeasures mapValues {s => (mean(s), variance(s)) }