我刚刚在一些运行测试Spark群集的旧PC上安装了spark-notebook。我从以下模板创建了一个笔记本:
{
"Simple" : {
"profile" : "standalone",
"name" : "Simple Test Spark Cluster",
"status" : "stopped",
"template" : {
"customLocalRepo" : null,
"customRepos" : null,
"customDeps" : null,
"customImports" : null,
"customSparkConf" : {
"spark.app.name" : "Notebook",
"spark.master" : "spark://mymaster:7077",
"spark.eventLog.enabled" : "true",
"spark.eventLog.dir" : "hdfs://mymaster:8020/var/log/spark",
"spark.shuffle.service.enabled" : "true",
"spark.dynamicAllocation.enabled" : "true"
}
}
}
}
我首先创建了一些有用的虚拟数据:
val someValues = sc.parallelize(1L to 10000L)
case class Foo(key: Long, value: Long)
val someDummyGrouping = someValues.map(v => (v / 100) -> v).reduceByKey((a, b) => a + b).map(t => Foo(t._1, t._2))
现在,我想在spark-notebook的github页面上注册一个临时表:
import org.apache.spark.sql.SQLContext
val sqlContext = new SQLContext(sc)
import sqlContext.implicits._
someDummyGrouping.toDF.registerTempTable("foo")
输出:
import org.apache.spark.sql.SQLContext
sqlContext: org.apache.spark.sql.SQLContext = org.apache.spark.sql.SQLContext@5c42083d
import sqlContext.implicits._
然而,当我尝试查询表时,我得不到令人满意的输出:
:sql select * from foo
输出是:
import notebook.front.widgets.Sql
import notebook.front.widgets.Sql._
res13: notebook.front.widgets.Sql = <Sql widget>
[key: bigint, value: bigint]
当我在我在AWS上安装的spark-notebook上运行它时,它开箱即用。
我忘了配置一些东西,如果是的话,我错过了什么?