我运行了spark的广播示例,代码为:
object BroadcastTest {
def main(args: Array[String]) {
val bcName = if (args.length > 2) args(2) else "Http"
val blockSize = if (args.length > 3) args(3) else "4096"
val sparkConf = new SparkConf().setAppName("Broadcast Test")
.set("spark.broadcast.factory", s"org.apache.spark.broadcast.${bcName}BroadcastFactory")
.set("spark.broadcast.blockSize", blockSize)
val sc = new SparkContext(sparkConf)
val slices = if (args.length > 0) args(0).toInt else 2
val num = if (args.length > 1) args(1).toInt else 1000000
val arr1 = (0 until num).toArray
for (i <- 0 until 3) {
println("Iteration " + i)
println("===========")
val startTime = System.nanoTime
val barr1 = sc.broadcast(arr1)
val observedSizes = sc.parallelize(1 to 10, slices).map(_ => barr1.value.size)
// Collect the small RDD so we can print the observed sizes locally.
observedSizes.collect().foreach(i => println(i))
println("Iteration %d took %.0f milliseconds".format(i, (System.nanoTime - startTime) / 1E6))
}
sc.stop()
}
}
跑步后,结果显示:
Iteration 0
===========
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
Iteration 0 took 805 milliseconds
Iteration 1
===========
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
Iteration 1 took 34 milliseconds
Iteration 2
===========
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
1000000
Iteration 2 took 32 milliseconds
我不明白为什么在Iteration 1上花费的时间比Iteration 2和Iteration3要长。我想出三个迭代的时间必须几乎相同。我希望你能解释一下这个理由,谢谢。