Question

我在 Scala 和 Java 中都设置了isSorted功能。当我测量两个函数的执行时间时，我看到 Scala 版本非常慢。对于10000 Int，它运行大约3.2秒，但Java版本只运行大约10 ms！如何让我的scala版本更快？

这些是实施：

Scala的：

def main(args: Array[String]) ={
    println(isSorted(getArray, (x:Int,y:Int) => x<y ))
}
def isSorted[A](items : Array[A], cond: (A,A) => Boolean) : Boolean = items match{
  case Array(_) => true 
  case Array(x,y) =>cond(x,y)
  case Array(_*) => cond(items(0),items(1)) && isSorted(items tail,cond)
}

爪哇：

public static void main(String... args){
    Sorter<Integer> sorter=new Sorter<Integer>();
    System.out.println(sorter.isSorted(sorter.getList(),new Comparator<Integer>() {
        @Override
        public int compare(Integer o1, Integer o2) {
            return o2.compareTo(o1);
        }
    }));
}
public  boolean isSorted(List<A> items, Comparator<A> cond){
    for(int i=1;i<items.size();i++){
            if(cond.compare(items.get(i-1), items.get(i)) < 0){
                return false;
            }
        }
    return true;
}

有什么建议吗？

我知道这是一个奇怪的代码：）

我想使用 Scala ，但这种糟糕的表现吓到了我！

Answer 1

您每次迭代都在Scala 中制作整个数组的副本。如果将O(n)算法替换为O(n^2)算法， course 将会更慢！这与Scala与Java甚至模式匹配无关。

如果要使用带有tails的算法，请切换到支持高效尾部的数据结构（例如List）。

def isSorted[A](items: List[A], cond: (A,A) => Boolean): Boolean = items match {
  case Nil => true
  case x :: y :: rest if (!cond(x,y)) => false
  case _ => isSorted(items.tail, cond)
}

或者，您可以实现与Java相同的算法，因为数组在索引方面是有效的：

def isSorted[A](items: Array[A], cond: (A,A) => Boolean): Boolean = {
  for (i <- 1 until items.length) if (!(cond(items(i-1),items(i)))) return false
  true
}

或者你可以，如果性能不是非常重要，可以切换到一些通用但仍然是O(n)算法：

def isSorted[A](items: Array[A], cond: (A,A) => Boolean) = 
  items.sliding(2).filter(_.length == 2).forall(x => cond(x(0),x(1)))

这可能比基于索引的版本慢5倍。

Answer 2

以下是我将如何做到这一点，以一种惯用于Scala的方式，更灵活，并坚持功能风格。

def isSorted[A](xs: Seq[A])(cond: (A, A) => Boolean) = {
  (xs, xs.view.drop(1)).zipped.forall { case(a,b) => cond(a,b) }
}

使用如下：

val xs = (0 to 10000).toArray
val sorted = isSorted(xs)(_ < _) // or isSorted(xs)((a,b) => a < b) if you prefer

与原始版本相比，这有几个优点：

适用于任何序列，而不仅仅是阵列
它比使用数组模式匹配的版本快约500倍
- 我的基本基准测试需要 4.6秒才能完成原始版本， 0.008 秒需要完成建议的版本
代码更短和更具惯用力
它将谓词函数放在自己的大括号中，这意味着它会自动为您推断类型，而不是让您手动将它们放入。你也可以免费获得_通配符。

修改

使用原始签名（失去了对类型推断条件函数的能力，并将类型限制为仅数组），您将拥有如下所示的函数：

def isSorted[A](items: Array[A], cond: (A, A) => Boolean): Boolean = { 
  (xs, xs.view.drop(1)).zipped.forall { case(a,b) => cond(a,b) }
}

这似乎比使用sliding(2)将数组转换为迭代器的版本快几倍。它不像使用直接索引器那样快速（直接从Java翻译）。它仍然比直接在阵列上的模式匹配快500或600倍。

我也将其从使用exists的否定改为使用forall的肯定断言。这不会影响性能，只会影响可读性（exists需要双重否定，这对我来说是愚蠢的。）

编辑2

我添加了对.view的调用，以防止第二个值的数组的完整副本。这会略微加快算法速度，使其达到接近（但不完全）的程度，直接快速访问索引。如果要更新函数签名，表达式仍然可以推广到非数组输入。

Answer 3

在Scala中使用惯用风格并不总是最合适的考虑性能。特别是Scala Array pattern matching is really slow。

但是，在这种情况下，使用非惯用风格将perform similar or slightly better than in Java。

这是您isSorted算法的一个版本，但使用的是经典的if条件，而不是pattern matching。您可以使用此解决方案运行相同的基准测试，这必须是一个很大的区别。如果表现更好，请告诉我。

 def isSorted[A](items: Array[A], cond: (A,A) => Boolean): Boolean = {
   if  (items.length == 1) true
   else if (cond(items(1), items(0))) false
   else { isSorted(items.tail, cond) }
 }

Answer 4

我只想添加另一种方法，也就是O（n）。

def isSorted[A](items: Array[A], cond: (A,A) => Boolean): Boolean =
  (0 until items.size - 1).forall(i => cond(items(i), items(i + 1)))

Answer 5

完全订阅@ Rex-Kerr答案，这是对数组索引方法的一种可能的改进，它依赖于每次迭代检查数组的每一端，如下所示，

def isSorted[A](items: Array[A], cond: (A,A) => Boolean): Boolean = {
  val len = items.length
  for (i <- 1 until len/2+1) 
    if (!(cond(items(i-1),items(i))) || !(cond(items(len-i-1),items(len-i))) ) return false
  true
}

通常，初始数组可以划分为n个段，并在每个段中并行应用条件。

如何让我的scala功能更快？

5 个答案:

修改

编辑2