应用错误收集

Spark版本：1.3.1
集群：Mesos 0.22.0
Scala版本：2.10.4

我在rdd上调用缓存时看到我的集群上完成了工作。我原以为下面代码的最后一行不会调用任何集群工作。是否存在缓存将集群工作的条件？

val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
// work is done to load the json into the dataframe
val people = sc.parallelize(
  """{"name":"Yin","address":{"city":"Columbus","state":"Ohio"}}""" :: Nil
)
val peoplDF = sqlContext.jsonRDD(people).toDF()
// No work is done for the orderBy, as expected
val orderBy = peoplDF.orderBy("name")
// Jobs are run when invoking cache, expectation was nothing would run on the cluster
val orderByCache = orderBy.cache

orderBy + cache正在调用mesos集群上的工作

0 个答案: