两个字节的有效按位OR [数组]

时间:2016-04-15 15:35:08

标签: performance scala apache-spark bytearray bitwise-operators

我需要在Spark中使用2个非常大(> 1GB)的ByteArray(所以使用Scala)。

我寻找最有效的方式(在速度和记忆方面),这意味着我不想使用像“拉链”这样的东西。将我的数组转换为列表的方法。

目前,我使用以下方法,但我想知道你们中有些人是否有其他想法...

def bitor(x: Array[Byte], y: Array[Byte]) : Array[Byte] = {
  for(i <- 0 to x.size) {
    x(i) = (x(i) | y(i)).toByte
  }
  return x
}

我应该通过JNI并在本地C中进行计算吗?

2 个答案:

答案 0 :(得分:0)

我假设您的代码在分布式环境中运行。如果是这样,我认为最好的选择是使用parallel collections API

并行集合使用计算机的多核硬件执行任务,并为开发人员提供简单而透明的工作。

我认为这种方法的主要优点是,如果您向云服务添加更多硬件并且您不需要更改任何内容,您的代码就会准备就绪。

我对你的代码和并行实现进行了一些测试。 请注意,我使用Scala REPL

在我的个人计算机上进行了这些测试
import scala.collection.parallel.mutable.ParArray
import scala.util.Random

// prepare arrays
val rnd = Random

// parallel arrays
val pArr1 = ParArray.tabulate(20000)(x => rnd.nextInt(100).toByte)
val pArr2 = ParArray.tabulate(20000)(x => rnd.nextInt(100).toByte)

// common arrays
val arr1 = pArr1.toArray
val arr2 = pArr2.toArray

println(pArr1)
println(pArr2)

println(arr1)
println(arr2)

println("Variables loaded")

// define parallel task
def parallel(arr1: ParArray[Byte], arr2: ParArray[Byte]): Unit = {
  val start = System.currentTimeMillis
  val r = (arr1 zip arr2).map(x => x._1 | x._2)
  //println(r)
  println(s"Execution time: ${System.currentTimeMillis - start}")
}

// define single thread task
def bitor(x: Array[Byte], y: Array[Byte]): Unit = {
  val start = System.currentTimeMillis
  for (i <- 0 until x.size) {
    x(i) = (x(i) | y(i)).toByte
  }
  //x.foreach(println)
  println(s"Execution time: ${System.currentTimeMillis - start}")
  //  return x
}

println("functions defined")

我在1到100之间生成20 000个随机数并将它们转换为字节。

之后,我执行了20次每个方法(并行和单线程),执行如下:

> (1 to 20).foreach(x => parallel(pArr1, pArr2))


// parallel method (in milliseconds)
1)  Execution time: 10
2)  Execution time: 3
3)  Execution time: 6
4)  Execution time: 4
5)  Execution time: 29
6)  Execution time: 4
7)  Execution time: 4
8)  Execution time: 3
9)  Execution time: 3
10) Execution time: 6
11) Execution time: 1
12) Execution time: 2
13) Execution time: 1
14) Execution time: 1
15) Execution time: 4
16) Execution time: 1
17) Execution time: 1
18) Execution time: 2
19) Execution time: 1
20) Execution time: 1

Avg(11 to 20) = 1.5 milliseconds

// --------------------------------------------- --------------------

(1 to 20).foreach(x => bitor(arr1, arr2))

// bitor method (in milliseconds)
1)  Execution time: 1
2)  Execution time: 0
3)  Execution time: 0
4)  Execution time: 1
5)  Execution time: 0
6)  Execution time: 0
7)  Execution time: 1
8)  Execution time: 0
9)  Execution time: 0
10) Execution time: 3
11) Execution time: 0
12) Execution time: 0
13) Execution time: 0
14) Execution time: 0
15) Execution time: 2
16) Execution time: 0
17) Execution time: 3
18) Execution time: 0
19) Execution time: 1
20) Execution time: 0

Avg(11 to 20) = 0.6 milliseconds

由于JIT编译器的准备,我删除了前十次执行。 See more here

正如您所看到的, bitor 方法比并行方法快一点,我不确定并行方法是否会通过更好的解决方案进行优化并行API,但我认为在分布式云环境中,并行方法应该比 bitor 更快。

答案 1 :(得分:0)

使用foreach desugars的代码相当于此java代码:

public final class _$$anon$1$$anonfun$bitor$1 extends AbstractFunction1$mcVI$sp implements Serializable {
    private final byte[] x$1;
    private final byte[] y$1;

    public _$$anon$1$$anonfun$bitor$1(byte[] x$1, byte[] y$1) {
        this.x$1 = x$1;
        this.y$1 = y$1;
    }

    @Override
    public final void apply(final int i) {
        this.apply$mcVI$sp(i);
    }

    @Override
    public void apply$mcVI$sp(final int i) {
        this.x$1[i] |= this.y$1[i];
    }
}


private byte[] bitor(final byte[] x, final byte[] y) {
    RichInt.to$extension0(Predef.intWrapper(0), Predef.byteArrayOps(x).size())
            .foreach(new _$$anon$1$$anonfun$bitor$1(x, y));
    return x;
}

但是,如果您将for理解替换为while,事情会发生变化:

def bitor(x: Array[Byte], y: Array[Byte]) : Array[Byte] = {
  var i = 0
  while (i < x.length) {
    x(i) = (x(i) | y(i)).toByte
    i += 1
  }

  x
}

转化为:

private byte[] bitor(final byte[] x, final byte[] y) {
    for (int i = 0; i < x.length; ++i) {
        x[i] |= y[i];
    }
    return x;
}