我想找到在Scala中编写欧几里德距离的最快方法。经过一番尝试后,我在这里。
def euclidean[V <: Seq[Double]](dot1: V, dot2: V): Double = {
var d = 0D
var i = 0
while( i < dot1.size ) {
val toPow2 = dot1(i) - dot2(i)
d += toPow2 * toPow2
i += 1
}
sqrt(d)
}
使用mutable.ArrayBuffer[Double]
作为V
可获得最快的结果,并且没有授权collection.parallel._
使用从2到10000的各种向量大小
对于那些希望通过以下距离函数测试here的人来说比较慢:
def euclideanDV(v1: DenseVector[Double], v2: DenseVector[Double]) = norm(v1 - v2)
如果有人知道任何纯Scala代码或库可以帮助提高速度,将不胜感激。
我遵循速度测试的方式。
val te1 = 0L
val te2 = 0L
val runNumber = 100000
val warmUp = 60000
(0 until runNumber).foreach{ x =>
val t1 = System.nanoTime
euclidean1(v1, v2)
val t2 = System.nanoTime
euclidean2(v1, v2)
val t3 = System.nanoTime
if( x >= warmUp ) {
te1 += t2 - t1
te2 += t3 - t2
}
}
这是我的一些尝试
// Fast on ArrayBuffer, quadratic on List
def euclidean1[V <: Seq[Double]](v1: V, v2: V) =
{
var d = 0D
var i = 0
while( i < v1.size ){
val toPow2 = v1(i) - v2(i)
d += toPow2 * toPow2
i += 1
}
sqrt(d)
}
// Breeze test
def euclideanDV(v1: DenseVector[Double], v2: DenseVector[Double]) = norm(v1 - v2)
// Slower than euclidean1
def euclidean2[V <: Seq[Double]](v1: V, v2: V) =
{
var d = 0D
var i = 0
while( i < v1.size )
{
d += pow(v1(i) - v2(i), 2)
i += 1
}
d
}
// Slower than 1 for Vsize ~< 1000 and a bit better over 1000 on ArrayBuffer
def euclidean3[V <: Seq[Double]](v1: V, v2: V) =
{
var d = 0D
var i = 0
(0 until v1.size).foreach{ i=>
val toPow2 = v1(i) - v2(i)
d += toPow2 * toPow2
}
sqrt(d)
}
// Slower than 1 for Vsize ~< 1000 and a bit better over 1000 on ArrayBuffer
def euclidean3bis(dot1: Seq[Double], dot2: Seq[Double]): Double =
{
var sum = 0D
dot1.indices.foreach{ id =>
val toPow2 = dot1(id) - dot2(id)
sum += toPow2 * toPow2
}
sqrt(sum)
}
// Slower than 1
def euclidean4[V <: Seq[Double]](v1: V, v2: V) =
{
var d = 0D
var i = 0
val vz = v1.zip(v2)
while( i < vz.size )
{
val (a, b) = vz(i)
val toPow2 = a - b
d += toPow2 * toPow2
i += 1
}
d
}
// Slower than 1
def euclideanL1(v1: List[Double], v2: List[Double]) = sqrt(v1.zip(v2).map{ case (a, b) =>
val toPow2 = a - b
toPow2 * toPow2
}.sum)
// Slower than 1
def euclidean5(dot1: Seq[Double], dot2: Seq[Double]): Double =
{
var sum = 0D
dot1.zipWithIndex.foreach{ case (a, id) =>
val toPow2 = a - dot2(id)
sum += toPow2 * toPow2
}
sqrt(sum)
}
// super super slow
def euclidean6(v1: Seq[Double], v2: Seq[Double]) = sqrt(v1.zip(v2).map{ case (a, b) => pow(a - b, 2) }.sum)
// Slower than 1
def euclidean7(dot1: Seq[Double], dot2: Seq[Double]): Double =
{
var sum = 0D
dot1.zip(dot2).foreach{ case (a, b) => sum += pow(a - b, 2) }
sum
}
// Slower than 1
def euclidean8(v1: Seq[Double], v2: Seq[Double]) =
{
def inc(n: Int, v: Double) = {
val toPow2 = v1(n) - v2(n)
v + toPow2 * toPow2
}
@annotation.tailrec
def go(n: Int, v: Double): Double =
{
if( n < v1.size - 1 ) go(n + 1, inc(n, v))
else inc(n, v)
}
sqrt(go(0, 0D))
}
// Slower than 1
def euclideanL2(v1: List[Double], v2: List[Double]) =
{
def inc(vzz: List[(Double, Double)], v: Double): Double =
{
val (a, b) = vzz.head
val toPow2 = a - b
v + toPow2 * toPow2
}
@annotation.tailrec
def go(vzz: List[(Double, Double)], v: Double): Double =
{
if( vzz.isEmpty ) v
else go(vzz.tail, inc(vzz, v))
}
sqrt(go(v1.zip(v2), 0D))
}
答案 0 :(得分:0)
我在List
上尝试了尾递归,但在ArrayBuffer
上却没有足够有效,我完全同意以下事实:需要使用诸如 JMH 之类的适当工具来正确测试速度效率。但是,当数量级加快10-50%之间时,我们可以确信它会更好。
即使它是V <: Seq[Double]
,它也不适用于List
,但适用于ArrayLike
结构。
这是我的建议
def euclideanF[V <: Seq[Double]](v1: V, v2: V) = {
@annotation.tailrec
def go(d: Double, i: Int): Double = {
if( i < v1.size ) {
val toPow2 = v1(i) - v2(i)
go(d + toPow2 * toPow2, i + 1)
}
else d
}
sqrt(go(0D, 0))
}