%spark.r解释器在Zeppelin 0.6.1中不起作用

时间:2016-08-25 03:57:03

标签: apache-spark apache-spark-sql sparkr apache-zeppelin

我正在使用Hadoop YARN,Oozie的Spark 1.6.2集群。我已经安装了Zeppelin 0.6.1(包含所有解释器的二进制包:zeppelin-0.6.1-bin-all.tgz)。当我尝试将SparkR脚本与%spark.r解释器一起使用时,

%spark.r
# Creating SparkConext and connecting to Cloudant DB
sc1 <- sparkR.init(sparkEnv = list("cloudant.host"="host_name","cloudant.username"="user_name","cloudant.password"="password", "jsonstore.rdd.schemaSampleSize"="-1"))

# Database to be connected to extract the data
database <- "sensordata"
# Creating Spark SQL Context
sqlContext <- sparkRSQL.init(sc)
# Creating DataFrame for the "sensordata" Cloudant DB
sensorDataDF <- read.df(sqlContext, database, header='true', source = "com.cloudant.spark",inferSchema='true')
# Get basic information about the DataFrame(sensorDataDF)
printSchema(sensorDataDF)

我收到以下错误(日志):

ERROR [2016-08-25 03:28:37,336] (
{Thread-77}
JobProgressPoller.java[run]:54) - Can not get or update progress
org.apache.zeppelin.interpreter.InterpreterException: org.apache.thrift.transport.TTransportException
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:373)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:111)
at org.apache.zeppelin.notebook.Paragraph.progress(Paragraph.java:237)
at org.apache.zeppelin.scheduler.JobProgressPoller.run(JobProgressPoller.java:51)
Caused by: org.apache.thrift.transport.TTransportException
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.recv_getProgress(RemoteInterpreterService.java:296)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Client.getProgress(RemoteInterpreterService.java:281)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreter.getProgress(RemoteInterpreter.java:370)
... 3 more

非常感谢帮助。

2 个答案:

答案 0 :(得分:0)

迁移到0.6.1后我遇到了同样的问题。问题是Zeppelin是用scala 2.11构建的,而Apache Spark 1.6.2是用scala 2.10构建的。 您需要使用scala 2.11构建spark 1.6.x或将spark代码迁移到2.0.0

答案 1 :(得分:0)

在解释器部分设置local [2]修复了我的问题。这最初由vgunnu提出

&#34;尝试将spark master设置为local [2],如果有效,你可能会在env文件中缺少一些环境变量 - vgunnu 8月25日4:37&#34;