Question

这定期出现。使用泛型编码的函数在scala中显然更慢。见下面的例子。特定类型的版本比通用版本快约1/3。鉴于通用组件在昂贵的循环之外，这是双倍意外的。对此有一个已知的解释吗？

  def xxxx_flttn[T](v: Array[Array[T]])(implicit m: Manifest[T]): Array[T] = {
    val I = v.length
    if (I <= 0) Array.ofDim[T](0)
    else {
      val J = v(0).length
      for (i <- 1 until I) if (v(i).length != J) throw new utl_err("2D matrix not symetric. cannot be flattened. first row has " + J + " elements. row " + i + " has " + v(i).length)
      val flt = Array.ofDim[T](I * J)
      for (i <- 0 until I; j <- 0 until J) flt(i * J + j) = v(i)(j)
      flt
    }
  }
  def flttn(v: Array[Array[Double]]): Array[Double] = {
    val I = v.length
    if (I <= 0) Array.ofDim[Double](0)
    else {
      val J = v(0).length
      for (i <- 1 until I) if (v(i).length != J) throw new utl_err("2D matrix not symetric. cannot be flattened. first row has " + J + " elements. row " + i + " has " + v(i).length)
      val flt = Array.ofDim[Double](I * J)
      for (i <- 0 until I; j <- 0 until J) flt(i * J + j) = v(i)(j)
      flt
    }
  }

Answer 1

这是由于装箱，当您将通用应用于基本类型并使用包含数组（或在方法签名中显示为普通的类型或作为成员）时。

实施例

在以下特征中，编译后，process方法将删除Array[Any]。

trait Foo[A]{
  def process(as: Array[A]): Int
}

如果您选择A作为值/原始类型，例如Double，则必须将其装箱。在以非通用方式编写特征时（例如，使用A=Double），process被编译为采用Array[Double]，这是JVM上的一种不同的数组类型。这样效率更高，因为为了在Double中存储Array[Any]，必须将Double包装（装箱）到一个对象中，该对象存储在数组中。特殊Array[Double]可以将Double直接存储在内存中作为64位值。

`@specialized` - 注释

如果您觉得很有意思，可以尝试使用@specialized关键字（它非常错误并经常使编译器崩溃）。这使得scalac为所有或选定的基元类型编译类的特殊版本。这只是有意义的，如果类型参数在类型签名（get(a: A)，但不是get(as: Seq[A])）中显示为明文，或者显示为Array的类型参数。如果专业化没有意义，我想你会收到警告。

Answer 2

你无法真正告诉你在这里测量什么 - 不管怎么说 - 因为for循环没有纯while循环那么快，而内在操作相当便宜。如果我们用while循环重写代码 - 关键的双重迭代是

 var i = 0
  while (i<I) {
    var j = 0
    while (j<J) {
      flt(i * J + j) = v(i)(j)
      j += 1
    }
    i += 1
  }
  flt

然后我们看到通用案例的字节码实际上是截然不同的。非通用：

133:    checkcast   #174; //class "[D"
136:    astore  6
138:    iconst_0
139:    istore  5
141:    iload   5
143:    iload_2
144:    if_icmpge   191
147:    iconst_0
148:    istore  4
150:    iload   4
152:    iload_3
153:    if_icmpge   182
// The stuff above implements the loop; now we do the real work
156:    aload   6
158:    iload   5
160:    iload_3
161:    imul
162:    iload   4
164:    iadd
165:    aload_1
166:    iload   5
168:    aaload             // v(i)
169:    iload   4
171:    daload             // v(i)(j)
172:    dastore            // flt(.) = _
173:    iload   4
175:    iconst_1
176:    iadd
177:    istore  4
// Okay, done with the inner work, time to jump around
179:    goto    150
182:    iload   5
184:    iconst_1
185:    iadd
186:    istore  5
188:    goto    141

这只是一堆跳转和低级操作（daload和dastore是从数组中加载和存储double的关键）。如果我们查看通用字节码的关键内部部分，它看起来像

160:    getstatic   #30; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
163:    aload   7
165:    iload   6
167:    iload   4
169:    imul
170:    iload   5
172:    iadd
173:    getstatic   #30; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
176:    aload_1
177:    iload   6
179:    aaload
180:    iload   5
182:    invokevirtual   #107; //Method scala/runtime/ScalaRunTime$.array_apply:(Ljava/lang/Object;I)Ljava/lang/Object;
185:    invokevirtual   #111; //Method scala/runtime/ScalaRunTime$.array_update:(Ljava/lang/Object;ILjava/lang/Object;)V
188:    iload   5
190:    iconst_1
191:    iadd
192:    istore  5

正如您所看到的，必须调用方法来执行数组应用和更新。这个字节码是一堆乱七八糟的东西，比如

2:   aload_3 
3:   instanceof      #98; //class "[Ljava/lang/Object;"
6:   ifeq    18
9:   aload_3   
10:  checkcast       #98; //class "[Ljava/lang/Object;"
13:  iload_2
14:  aaload 
15:  goto    183
18:  aload_3
19:  instanceof      #100; //class "[I"
22:  ifeq    37
25:  aload_3   
26:  checkcast       #100; //class "[I"
29:  iload_2
30:  iaload 
31:  invokestatic    #106; //Method scala/runtime/BoxesRunTime.boxToInteger:
34:  goto    183
37:  aload_3
38:  instanceof      #108; //class "[D"
41:  ifeq    56
44:  aload_3   
45:  checkcast       #108; //class "[D"
48:  iload_2
49:  daload 
50:  invokestatic    #112; //Method scala/runtime/BoxesRunTime.boxToDouble:(
53:  goto    183

基本上必须测试每种类型的数组，如果它是您正在寻找的类型，请将其打包。 Double非常靠近前面（10的第3位），但它仍然是一个相当大的开销，即使JVM可以识别代码最终是box / unbox，因此实际上并不需要分配内存。（我不确定它能做到这一点，但即使它可能无法解决问题。）

那么，该怎么办？您可以尝试[@specialized T]，这将为您扩展您的代码十倍，就像您自己编写每个原始数组操作一样。专业化是2.9中的错误（在2.10中应该不那么），但是它可能不会像你希望的那样工作。如果速度至关重要 - 好吧，首先，写while循环而不是for循环（或至少使用-optimise编译，这有助于循环输出大约两倍！），然后考虑专业化或编写手动编码您需要的类型。

scala隐含性能

2 个答案:

实施例

`@specialized` - 注释

scala隐含性能

2 个答案:

实施例

@specialized - 注释

`@specialized` - 注释