以下是基于Jeff Atwood答案的熵计算:How to calculate the entropy of a file?基于http://en.wikipedia.org/wiki/Entropy_(information_theory):
object MeasureEntropy extends App {
val s = "measure measure here measure measure measure"
def entropyValue(s: String) = {
val m = s.split(" ").toList.groupBy((word: String) => word).mapValues(_.length.toDouble)
var result: Double = 0.0;
val len = s.split(" ").length;
m map {
case (key, value: Double) =>
{
var frequency: Double = value / len;
result -= frequency * (scala.math.log(frequency) / scala.math.log(2));
}
}
result;
}
println(entropyValue(s))
}
我希望通过删除与以下内容相关的可变状态来改进这一点:
var result: Double = 0.0;
如何将result
合并到map
函数的单个计算中?
答案 0 :(得分:1)
使用foldLeft
,或者在这种情况下/:
,这是一个语法糖:
(0d /: m) {case (result, (key,value)) =>
val frequency = value / len
result - frequency * (scala.math.log(frequency) / scala.math.log(2))
}
文档:http://www.scala-lang.org/files/archive/api/current/index.html#scala.collection.immutable.Map@/:B(操作:(B,A)=> B):B
答案 1 :(得分:1)
一个简单的sum
可以解决问题:
m.map {
case (key, value: Double) =>
val frequency: Double = value / len;
- frequency * (scala.math.log(frequency) / scala.math.log(2));
}.sum
答案 2 :(得分:1)
可以使用如下所示的foldLeft编写。
def entropyValue(s: String) = {
val m = s.split(" ").toList.groupBy((word: String) => word).mapValues(_.length.toDouble)
val len = s.split(" ").length
m.foldLeft(0.0)((r, t) => r - ((t._2 / len) * (scala.math.log(t._2 / len) / scala.math.log(2))))
}