我正在尝试将zeppelin与highcharts连接..
%spark
import com.knockdata.zeppelin.highcharts._
import com.knockdata.zeppelin.highcharts.model._
import sqlContext.implicits._
val Tokyo = Seq(7.0, 6.9, 9.5, 14.5, 18.2, 21.5, 25.2, 26.5, 23.3,
18.3, 13.9, 9.6).map(("Tokyo", _))
val df = (Tokyo).toDF("city", "temperature")
df.show()
highcharts(df.seriesCol("city").series("y" -> col("temperature"))).plot()
给出了
import com.knockdata.zeppelin.highcharts._
import com.knockdata.zeppelin.highcharts.model._
import sqlContext.implicits._
Tokyo: Seq[(String, Double)] = List((Tokyo,7.0), (Tokyo,6.9), (Tokyo,9.5), (Tokyo,14.5), (Tokyo,18.2), (Tokyo,21.5), (Tokyo,25.2), (Tokyo,26.5), (Tokyo,23.3), (Tokyo,18.3), (Tokyo,13.9), (Tokyo,9.6))
df: org.apache.spark.sql.DataFrame = [city: string, temperature: double]
+-----+-----------+
| city|temperature|
+-----+-----------+
|Tokyo| 7.0|
|Tokyo| 6.9|
|Tokyo| 9.5|
|Tokyo| 14.5|
|Tokyo| 18.2|
|Tokyo| 21.5|
|Tokyo| 25.2|
|Tokyo| 26.5|
|Tokyo| 23.3|
|Tokyo| 18.3|
|Tokyo| 13.9|
|Tokyo| 9.6|
+-----+-----------+
<console>:201: error: value seriesCol is not a member of org.apache.spark.sql.DataFrame
highcharts(df.seriesCol("city").series("y" -> col("temperature"))).plot()
我在spark解释器中添加了依赖项工件com.knockdata:zeppelin-highcharts:0.2
关注https://github.com/knockdata/zeppelin-highcharts/blob/master/docs/DemoLineChart.md 并使用Are there better interface to add Highcharts support to Zeppelin尝试了银行数据,但获得了
<console>:224: error: value series is not a member of org.apache.spark.rdd.RDD[Bank]
possible cause: maybe a semicolon is missing before `value series'?
.series("x" -> "age", "y" -> avg(col("income")))
请帮我,我哪里出错了?可能是什么问题呢? 提前谢谢
答案 0 :(得分:0)
DataFrame可以隐式转换为具有函数seriesCol的SeriesHolder。它是在0.6.0版本中添加的。
df.seriesCol("city")
该错误应与使用错误版本的spark-highcharts相关。示例代码(doc)对应于0.6.0版(直接映射到zeppelin版本)。
使用docker可能是最简单的方法。或者使用类似于Dockerfile
的方式docker run -p 8080:8080 -d knockdata/zeppelin-highcharts
答案 1 :(得分:0)
我将spark解释器com.knockdata:zeppelin-highcharts:0.2
中的依赖项工件更改为com.knockdata:zeppelin-highcharts:0.6.0
以解决问题..但银行数据问题仍然存在..对此有何帮助?
%spark
import com.knockdata.zeppelin.highcharts._
import com.knockdata.zeppelin.highcharts.model._
import sqlContext.implicits._
val bankText = sc.textFile("/home/priyanka/Downloads/bank-data.csv")
case class Bank(age:Integer, region:String, income : Float, married : String, children : Integer, car:String, save_act:String, current_act : String, mortgage : String, pep : String)
// split each line, filter out header (starts with "age"), and map it into Bank case class
val bank = bankText.map(s=>s.split(",")).filter(s=>s(0)!="age").map(
s=>Bank(s(0).toInt,
s(1).replaceAll("\"", ""),
s(2).replaceAll("\"", "").toFloat,
s(3).replaceAll("\"", ""),
s(4).replaceAll("\"", "").toInt,
s(5).replaceAll("\"", ""),
s(6).replaceAll("\"", ""),
s(7).replaceAll("\"", ""),
s(8).replaceAll("\"", ""),
s(9).replaceAll("\"", "")
)
)
// convert to DataFrame and create temporal table
bank.toDF().registerTempTable("bank")
highcharts(bank.series("x" -> "age", "y" -> avg(col("income"))).orderBy(col("age"))).plot()
给出了
import com.knockdata.zeppelin.highcharts._
import com.knockdata.zeppelin.highcharts.model._
import sqlContext.implicits._
bankText: org.apache.spark.rdd.RDD[String] = MapPartitionsRDD[49] at textFile at <console>:62
defined class Bank
bank: org.apache.spark.rdd.RDD[Bank] = MapPartitionsRDD[52] at map at <console>:66
<console>:70: error: value series is not a member of org.apache.spark.rdd.RDD[Bank]
possible cause: maybe a semicolon is missing before `value series'?
.series("x" -> "age", "y" -> avg(col("income")))
^
谢谢