我正在运行以下内容: - spark独立集群(pre-build:http://d3kbcqa49mib13.cloudfront.net/spark-1.5.1-bin-hadoop2.6.tgz) - Zeppelin 0.5.0(标签:https://github.com/apache/incubator-zeppelin/releases/tag/v0.5.0) - Oracle JDK 8u66
我可以从任何一个火花集群机器启动一个火花壳。
我安装了Zeppelin如下(https://zeppelin.incubator.apache.org/docs/install/install.html):
git clone https://github.com/apache/incubator-zeppelin zeppelin
cd zeppelin
git checkout tags/v0.5.0
mvn install -DskipTests -Dspark.version=1.5.1 -Dhadoop.version=2.6.0
我已将zeppelin-env.sh
配置如下:
export JAVA_HOME="/home/spark/java"
export MASTER="spark://master:7077"
export ZEPPELIN_JAVA_OPTS="-Dspark.executor.memory=2g -Dspark.cores.max=8"
export ZEPPELIN_MEM="-Xmx2048m -XX:MaxPermSize=512m"
export SPARK_HOME=/home/spark/spark
export SPARK_CONF_DIR=/home/spark/spark/conf
请注意,SPARK_HOME与火花群集节点上的火花版本相同。
现在我创建我的第一个音符并测试与我运行的集群的连接:
%spark val ctx = new org.apache.spark.sql.SqlContext(sc)
我收到以下错误
ERROR [2015-11-09 12:02:40,172] ({pool-1-thread-3} ProcessFunction.java[process]:41) - Internal error processing getProgress
org.apache.zeppelin.interpreter.InterpreterException: akka.ConfigurationException: Akka JAR version [2.3.11] does not match the provided config version [2.3.4]
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:75)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.open(LazyOpenInterpreter.java:68)
at org.apache.zeppelin.interpreter.LazyOpenInterpreter.getProgress(LazyOpenInterpreter.java:109)
at org.apache.zeppelin.interpreter.remote.RemoteInterpreterServer.getProgress(RemoteInterpreterServer.java:299)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:938)
at org.apache.zeppelin.interpreter.thrift.RemoteInterpreterService$Processor$getProgress.getResult(RemoteInterpreterService.java:923)
at org.apache.thrift.ProcessFunction.process(ProcessFunction.java:39)
at org.apache.thrift.TBaseProcessor.process(TBaseProcessor.java:39)
at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:206)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: akka.ConfigurationException: Akka JAR version [2.3.11] does not match the provided config version [2.3.4]
at akka.actor.ActorSystem$Settings.<init>(ActorSystem.scala:210)
at akka.actor.ActorSystemImpl.<init>(ActorSystem.scala:505)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:142)
at akka.actor.ActorSystem$.apply(ActorSystem.scala:119)
at org.apache.spark.util.AkkaUtils$.org$apache$spark$util$AkkaUtils$$doCreateActorSystem(AkkaUtils.scala:121)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:53)
at org.apache.spark.util.AkkaUtils$$anonfun$1.apply(AkkaUtils.scala:52)
at org.apache.spark.util.Utils$$anonfun$startServiceOnPort$1.apply$mcVI$sp(Utils.scala:1913)
at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:141)
at org.apache.spark.util.Utils$.startServiceOnPort(Utils.scala:1904)
at org.apache.spark.util.AkkaUtils$.createActorSystem(AkkaUtils.scala:55)
at org.apache.spark.rpc.akka.AkkaRpcEnvFactory.create(AkkaRpcEnv.scala:253)
at org.apache.spark.rpc.RpcEnv$.create(RpcEnv.scala:53)
at org.apache.spark.SparkEnv$.create(SparkEnv.scala:252)
at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:193)
at org.apache.spark.SparkContext.createSparkEnv(SparkContext.scala:277)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:450)
at org.apache.zeppelin.spark.SparkInterpreter.createSparkContext(SparkInterpreter.java:301)
at org.apache.zeppelin.spark.SparkInterpreter.getSparkContext(SparkInterpreter.java:146)
at org.apache.zeppelin.spark.SparkInterpreter.open(SparkInterpreter.java:423)
at org.apache.zeppelin.interpreter.ClassloaderInterpreter.open(ClassloaderInterpreter.java:73)
... 11 more
有谁知道我做错了什么?
答案 0 :(得分:1)
在1.5版本中。(1)Spark使用2.3.11版本的akka。
Zeppelin版本0.5没有任何相关的变化。
如果可能的话 - 最好使用0.5.5版本(4天前发布),因为已经存在一个包含所有必需依赖项的spark-1.5配置文件:
https://github.com/apache/incubator-zeppelin/blob/v0.5.5/spark-dependencies/pom.xml#L459
此外,在Zeppelin中使用spark profile(而不是spark.version属性)会自动将其他所有内容设置为正确版本。
mvn clean install -Pspark-1.5 -Dhadoop.version=2.6.0 -Phadoop-2.6 -DskipTests
更新1:
似乎v0.5.5依赖于apache-jar-resource-bundle的SNAPSHOT版本。由于此类更改,还应将apache快照存储库添加到maven设置(.m2 / settings.xml):
<profiles>
<profile>
...
<repositories>
...
<repository>
<id>apache-snapshots</id>
<name>apache-snapshots</name>
<releases>
<enabled>false</enabled>
<updatePolicy>never</updatePolicy>
<checksumPolicy>fail</checksumPolicy>
</releases>
<snapshots>
<enabled>true</enabled>
<updatePolicy>daily</updatePolicy>
<checksumPolicy>fail</checksumPolicy>
</snapshots>
<url>http://repository.apache.org/snapshots/</url>
</repository>
...
</repositories>
...