Spark版本:1.3.1
集群:Mesos 0.22.0
Scala版本:2.10.4
我在rdd上调用缓存时看到我的集群上完成了工作。我原以为下面代码的最后一行不会调用任何集群工作。是否存在缓存将集群工作的条件?
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
// work is done to load the json into the dataframe
val people = sc.parallelize(
"""{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil
)
val peoplDF = sqlContext.jsonRDD(people).toDF()
// No work is done for the orderBy, as expected
val orderBy = peoplDF.orderBy("name")
// Jobs are run when invoking cache, expectation was nothing would run on the cluster
val orderByCache = orderBy.cache