我不理解火花广播示例的结果

时间:2016-08-02 13:57:43

标签: apache-spark broadcast

我运行了spark的广播示例,代码为:

object BroadcastTest {
  def main(args: Array[String]) {

    val bcName = if (args.length > 2) args(2) else "Http"
    val blockSize = if (args.length > 3) args(3) else "4096"

    val sparkConf = new SparkConf().setAppName("Broadcast Test")
      .set("spark.broadcast.factory", s"org.apache.spark.broadcast.${bcName}BroadcastFactory")
      .set("spark.broadcast.blockSize", blockSize)
    val sc = new SparkContext(sparkConf)

    val slices = if (args.length > 0) args(0).toInt else 2
    val num = if (args.length > 1) args(1).toInt else 1000000

    val arr1 = (0 until num).toArray

    for (i <- 0 until 3) {
      println("Iteration " + i)
      println("===========")
      val startTime = System.nanoTime
      val barr1 = sc.broadcast(arr1)
      val observedSizes = sc.parallelize(1 to 10, slices).map(_ => barr1.value.size)
      // Collect the small RDD so we can print the observed sizes locally.
      observedSizes.collect().foreach(i => println(i))
      println("Iteration %d took %.0f milliseconds".format(i, (System.nanoTime - startTime) / 1E6))
    }

    sc.stop()
  }
}

跑步后,结果显示:

Iteration 0
===========
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
Iteration 0 took 805 milliseconds
Iteration 1
===========
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
Iteration 1 took 34 milliseconds
Iteration 2
===========
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
Iteration 2 took 32 milliseconds

我不明白为什么在Iteration 1上花费的时间比Iteration 2和Iteration3要长。我想出三个迭代的时间必须几乎相同。我希望你能解释一下这个理由,谢谢。

0 个答案:

没有答案