我已经使用cloudera manager安装了spark,我已经使用以下命令配置并启动了Spark服务:
/opt/cloudera/parcels/SPARK/lib/spark/sbin/start-master.sh
/opt/cloudera/parcels/SPARK/lib/spark/sbin/start-slaves.sh
然后我想运行WordConut来测试我的火花,首先我在主节点上启动spark-shell:
15/07/28 13:44:25 INFO spark.HttpServer: Starting HTTP Server
15/07/28 13:44:25 INFO server.Server: jetty-7.6.8.v20121106
15/07/28 13:44:25 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:45213
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 0.9.0
/_/
Using Scala version 2.10.3 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_51)
Type in expressions to have them evaluated.
Type :help for more information.
15/07/28 13:44:31 INFO slf4j.Slf4jLogger: Slf4jLogger started
15/07/28 13:44:32 INFO Remoting: Starting remoting
15/07/28 13:44:32 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://spark@hadoop241:45741]
15/07/28 13:44:32 INFO Remoting: Remoting now listens on addresses: [akka.tcp://spark@hadoop241:45741]
15/07/28 13:44:32 INFO spark.SparkEnv: Registering BlockManagerMaster
15/07/28 13:44:32 INFO storage.DiskBlockManager: Created local directory at /tmp/spark-local-20150728134432-ac8c
15/07/28 13:44:32 INFO storage.MemoryStore: MemoryStore started with capacity 294.9 MB.
15/07/28 13:44:32 INFO network.ConnectionManager: Bound socket to port 56158 with id = ConnectionManagerId(hadoop241,56158)
15/07/28 13:44:32 INFO storage.BlockManagerMaster: Trying to register BlockManager
15/07/28 13:44:32 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop241:56158 with 294.9 MB RAM
15/07/28 13:44:32 INFO storage.BlockManagerMaster: Registered BlockManager
15/07/28 13:44:32 INFO spark.HttpServer: Starting HTTP Server
15/07/28 13:44:32 INFO server.Server: jetty-7.6.8.v20121106
15/07/28 13:44:32 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:39279
15/07/28 13:44:32 INFO broadcast.HttpBroadcast: Broadcast server started at http://192.168.2.241:39279
15/07/28 13:44:32 INFO spark.SparkEnv: Registering MapOutputTracker
15/07/28 13:44:32 INFO spark.HttpFileServer: HTTP File server directory is /tmp/spark-06dad7a7-d1fb-433d-bbab-37f20fb02057
15/07/28 13:44:32 INFO spark.HttpServer: Starting HTTP Server
15/07/28 13:44:32 INFO server.Server: jetty-7.6.8.v20121106
15/07/28 13:44:32 INFO server.AbstractConnector: Started SocketConnector@0.0.0.0:46380
15/07/28 13:44:32 INFO server.Server: jetty-7.6.8.v20121106
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage/rdd,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/storage,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/stage,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages/pool,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/stages,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/environment,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/executors,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/metrics/json,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/static,null}
15/07/28 13:44:32 INFO handler.ContextHandler: started o.e.j.s.h.ContextHandler{/,null}
15/07/28 13:44:32 INFO server.AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/07/28 13:44:32 INFO ui.SparkUI: Started Spark Web UI at http://hadoop241:4040
15/07/28 13:44:32 INFO client.AppClient$ClientActor: Connecting to master spark://192.168.2.241:7077...
Created spark context..
Spark context available as sc.
scala> 15/07/28 13:44:33 INFO cluster.SparkDeploySchedulerBackend: Connected to Spark cluster with app ID app-20150728134433-0001
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor added: app-20150728134433-0001/0 on worker-20150724192744-hadoop246-7078 (hadoop246:7078) with 16 cores
15/07/28 13:44:33 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150728134433-0001/0 on hostPort hadoop246:7078 with 16 cores, 512.0 MB RAM
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor added: app-20150728134433-0001/1 on worker-20150724132945-hadoop241-7078 (hadoop241:7078) with 8 cores
15/07/28 13:44:33 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150728134433-0001/1 on hostPort hadoop241:7078 with 8 cores, 512.0 MB RAM
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor added: app-20150728134433-0001/2 on worker-20150724132947-hadoop245-7078 (hadoop245:7078) with 8 cores
15/07/28 13:44:33 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150728134433-0001/2 on hostPort hadoop245:7078 with 8 cores, 512.0 MB RAM
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor added: app-20150728134433-0001/3 on worker-20150724132949-hadoop254-7078 (hadoop254:7078) with 8 cores
15/07/28 13:44:33 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150728134433-0001/3 on hostPort hadoop254:7078 with 8 cores, 512.0 MB RAM
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor added: app-20150728134433-0001/4 on worker-20150724183923-hadoop217-7078 (hadoop217:7078) with 8 cores
15/07/28 13:44:33 INFO cluster.SparkDeploySchedulerBackend: Granted executor ID app-20150728134433-0001/4 on hostPort hadoop217:7078 with 8 cores, 512.0 MB RAM
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor updated: app-20150728134433-0001/3 is now RUNNING
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor updated: app-20150728134433-0001/4 is now RUNNING
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor updated: app-20150728134433-0001/1 is now RUNNING
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor updated: app-20150728134433-0001/2 is now RUNNING
15/07/28 13:44:33 INFO client.AppClient$ClientActor: Executor updated: app-20150728134433-0001/0 is now RUNNING
15/07/28 13:44:35 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop241:60944/user/Executor#1370617929] with ID 1
15/07/28 13:44:36 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop241:38177 with 294.9 MB RAM
15/07/28 13:44:37 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop217:45179/user/Executor#357014410] with ID 4
15/07/28 13:44:38 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop217:32361 with 294.9 MB RAM
15/07/28 13:44:38 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop254:4899/user/Executor#-432875177] with ID 3
15/07/28 13:44:38 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop245:54837/user/Executor#2060262779] with ID 2
15/07/28 13:44:38 INFO cluster.SparkDeploySchedulerBackend: Registered executor: Actor[akka.tcp://sparkExecutor@hadoop246:41470/user/Executor#296060469] with ID 0
15/07/28 13:44:38 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop245:11915 with 294.9 MB RAM
15/07/28 13:44:39 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop246:55377 with 294.9 MB RAM
15/07/28 13:44:39 INFO storage.BlockManagerMasterActor$BlockManagerInfo: Registering block manager hadoop254:48560 with 294.9 MB RAM
val file=sc.textFile("hdfs//192.168.2.241:8020/root/workspace/testfile")
直到这一步,没有问题,但我在接下来的步骤中遇到了一些问题:
val counts = file.flatMap(line => line.split(" ")).map(word => (word, 1)).reduceByKey(_ + _)
我明白了:
java.lang.NoClassDefFoundError: com/google/protobuf/ServiceException
at org.apache.hadoop.ipc.ProtobufRpcEngine.<clinit>(ProtobufRpcEngine.java:64)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:270)
at org.apache.hadoop.conf.Configuration.getClassByNameOrNull(Configuration.java:1713)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:1678)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:1772)
at org.apache.hadoop.ipc.RPC.getProtocolEngine(RPC.java:201)
at org.apache.hadoop.ipc.RPC.getProtocolProxy(RPC.java:522)
at org.apache.hadoop.hdfs.NameNodeProxies.createNNProxyWithClientProtocol(NameNodeProxies.java:347)
at org.apache.hadoop.hdfs.NameNodeProxies.createNonHAProxy(NameNodeProxies.java:168)
at org.apache.hadoop.hdfs.NameNodeProxies.createProxy(NameNodeProxies.java:129)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:448)
at org.apache.hadoop.hdfs.DFSClient.<init>(DFSClient.java:410)
at org.apache.hadoop.hdfs.DistributedFileSystem.initialize(DistributedFileSystem.java:128)
at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2308)
at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:87)
at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2342)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2324)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:351)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:163)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:575)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:363)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:336)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:391)
at org.apache.spark.SparkContext$$anonfun$15.apply(SparkContext.scala:391)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:111)
at org.apache.spark.rdd.HadoopRDD$$anonfun$getJobConf$1.apply(HadoopRDD.scala:111)
at scala.Option.map(Option.scala:145)
at org.apache.spark.rdd.HadoopRDD.getJobConf(HadoopRDD.scala:111)
at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:133)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.FlatMappedRDD.getPartitions(FlatMappedRDD.scala:30)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.rdd.MappedRDD.getPartitions(MappedRDD.scala:28)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:207)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
at scala.Option.getOrElse(Option.scala:120)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:205)
at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:58)
at org.apache.spark.rdd.PairRDDFunctions.reduceByKey(PairRDDFunctions.scala:354)
at $iwC$$iwC$$iwC$$iwC.<init>(<console>:14)
at $iwC$$iwC$$iwC.<init>(<console>:19)
at $iwC$$iwC.<init>(<console>:21)
at $iwC.<init>(<console>:23)
at <init>(<console>:25)
at .<init>(<console>:29)
at .<clinit>(<console>)
at .<init>(<console>:7)
at .<clinit>(<console>)
at $print(<console>)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:772)
at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1040)
at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:609)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:640)
at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:604)
at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:788)
at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:833)
at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:745)
at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:593)
at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:600)
at org.apache.spark.repl.SparkILoop.loop(SparkILoop.scala:603)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply$mcZ$sp(SparkILoop.scala:926)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:876)
at org.apache.spark.repl.SparkILoop$$anonfun$process$1.apply(SparkILoop.scala:876)
at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:876)
at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:968)
at org.apache.spark.repl.Main$.main(Main.scala:31)
at org.apache.spark.repl.Main.main(Main.scala)
Caused by: java.lang.ClassNotFoundException: com.google.protobuf.ServiceException
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
... 84 more
谁能帮助我吗? :)
答案 0 :(得分:0)
看起来对包的版本有很多不足
Spark对其运行的集群版本非常敏感,必须使用相同的版本进行编译
例如,以下是cloudera 5.3集群的说明: http://www.cloudera.com/content/cloudera/en/documentation/core/v5-3-x/topics/cdh_ig_spark_installation.html
答案 1 :(得分:0)
我发现此问题是由于在此词典中缺少protobuf-java-2.4.1.jar的jar文件引起的:/ opt / cloudera / parcels / SPARK / lib / spark / lib
答案 2 :(得分:0)
我遇到了同样的问题:早期的protobuf版本(2.5.0),以下是逐步解决这个问题的过程。希望能帮到你。
Exception in thread "dag-scheduler-event-loop" java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AppendRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
https://issues.apache.org/jira/browse/HADOOP-9845
需要在Spark-core&#34;
中删除&#34;排除protobuf v2.5.0