我使用spark-cassandra-connector
与spark-shell
一起运行时遇到问题。
一般情况下,我按照this tutorial安装Cassandra / Spark OSS Stack'由Amy Tobey为#34;使用spark-cassandra-connector"部分。我明白了:
我设法连接到Cassandra集群
INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
但我无法在count
类的table
对象上运行CassandraTableScanRDD
方法
我不知道如何解释控制台错误输出(谷歌搜索它没有带来任何影响),我很想知道我做错了什么。
CONSOLE OUTPUT:
1。用spark-cassandra-connector jar运行Spark
$ /usr/local/src/spark/spark-1.1.0/bin/spark-shell --jars /usr/local/src/spark/spark-1.1.0/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/03/30 01:17:40 INFO SecurityManager: Changing view acls to: martakarass,
15/03/30 01:17:40 INFO SecurityManager: Changing modify acls to: martakarass,
15/03/30 01:17:40 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(martakarass, ); users with modify permissions: Set(martakarass, )
15/03/30 01:17:40 INFO HttpServer: Starting HTTP Server
15/03/30 01:17:40 INFO Utils: Successfully started service 'HTTP class server' on port 38860.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 1.1.0
/_/
Using Scala version 2.10.4 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_31)
Type in expressions to have them evaluated.
Type :help for more information.
15/03/30 01:17:42 INFO SecurityManager: Changing view acls to: martakarass,
15/03/30 01:17:42 INFO SecurityManager: Changing modify acls to: martakarass,
15/03/30 01:17:42 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(martakarass, ); users with modify permissions: Set(martakarass, )
15/03/30 01:17:43 INFO Slf4jLogger: Slf4jLogger started
15/03/30 01:17:43 INFO Remoting: Starting remoting
15/03/30 01:17:43 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@marta-komputer.home:48238]
15/03/30 01:17:43 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@marta-komputer.home:48238]
15/03/30 01:17:43 INFO Utils: Successfully started service 'sparkDriver' on port 48238.
15/03/30 01:17:43 INFO SparkEnv: Registering MapOutputTracker
15/03/30 01:17:43 INFO SparkEnv: Registering BlockManagerMaster
15/03/30 01:17:43 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150330011743-7904
15/03/30 01:17:43 INFO Utils: Successfully started service 'Connection manager for block manager' on port 55197.
15/03/30 01:17:43 INFO ConnectionManager: Bound socket to port 55197 with id = ConnectionManagerId(marta-komputer.home,55197)
15/03/30 01:17:43 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
15/03/30 01:17:43 INFO BlockManagerMaster: Trying to register BlockManager
15/03/30 01:17:43 INFO BlockManagerMasterActor: Registering block manager marta-komputer.home:55197 with 265.1 MB RAM
15/03/30 01:17:43 INFO BlockManagerMaster: Registered BlockManager
15/03/30 01:17:43 INFO HttpFileServer: HTTP File server directory is /tmp/spark-f69a93d0-da4f-4c85-9b46-8ad33169763a
15/03/30 01:17:43 INFO HttpServer: Starting HTTP Server
15/03/30 01:17:43 INFO Utils: Successfully started service 'HTTP file server' on port 38225.
15/03/30 01:17:43 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/30 01:17:43 INFO SparkUI: Started SparkUI at http://marta-komputer.home:4040
15/03/30 01:17:43 INFO SparkContext: Added JAR file:/usr/local/src/spark/spark-1.1.0/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar at http://192.168.1.10:38225/jars/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar with timestamp 1427671063959
15/03/30 01:17:44 INFO Executor: Using REPL class URI: http://192.168.1.10:38860
15/03/30 01:17:44 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@marta-komputer.home:48238/user/HeartbeatReceiver
15/03/30 01:17:44 INFO SparkILoop: Created spark context..
Spark context available as sc.
2。执行进口
scala>
scala> sc.stop
15/03/30 01:17:51 INFO SparkUI: Stopped Spark web UI at http://marta-komputer.home:4040
15/03/30 01:17:51 INFO DAGScheduler: Stopping DAGScheduler
15/03/30 01:17:52 INFO MapOutputTrackerMasterActor: MapOutputTrackerActor stopped!
15/03/30 01:17:52 INFO ConnectionManager: Selector thread was interrupted!
15/03/30 01:17:52 INFO ConnectionManager: ConnectionManager stopped
15/03/30 01:17:52 INFO MemoryStore: MemoryStore cleared
15/03/30 01:17:52 INFO BlockManager: BlockManager stopped
15/03/30 01:17:52 INFO BlockManagerMaster: BlockManagerMaster stopped
15/03/30 01:17:52 INFO SparkContext: Successfully stopped SparkContext
scala> im15/03/30 01:17:52 INFO RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon.
port com.15/03/30 01:17:52 INFO RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports.
datastax.spark.connector._
15/03/30 01:17:52 INFO Remoting: Remoting shut down
15/03/30 01:17:52 INFO RemoteActorRefProvider$RemotingTerminator: Remoting shut down.
import com.datastax.spark.connector._
scala> import org.apache.spark.SparkContext
import org.apache.spark.SparkContext
scala> import org.apache.spark.SparkContext._
import org.apache.spark.SparkContext._
scala> import org.apache.spark.SparkConf
import org.apache.spark.SparkConf
第3。定义spark.cassandra.connection.host
,定义SparkContext
scala> val conf = new SparkConf(true).set("spark.cassandra.connection.host", "127.0.0.1")
conf: org.apache.spark.SparkConf = org.apache.spark.SparkConf@e6e5da4
scala> val sc = new SparkContext("local[*]", "test", conf)
15/03/30 01:17:54 INFO SecurityManager: Changing view acls to: martakarass,
15/03/30 01:17:54 INFO SecurityManager: Changing modify acls to: martakarass,
15/03/30 01:17:54 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(martakarass, ); users with modify permissions: Set(martakarass, )
15/03/30 01:17:54 INFO Slf4jLogger: Slf4jLogger started
15/03/30 01:17:54 INFO Remoting: Starting remoting
15/03/30 01:17:54 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@localhost:35080]
15/03/30 01:17:54 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriver@localhost:35080]
15/03/30 01:17:54 INFO Utils: Successfully started service 'sparkDriver' on port 35080.
15/03/30 01:17:54 INFO SparkEnv: Registering MapOutputTracker
15/03/30 01:17:54 INFO SparkEnv: Registering BlockManagerMaster
15/03/30 01:17:54 INFO DiskBlockManager: Created local directory at /tmp/spark-local-20150330011754-63ea
15/03/30 01:17:54 INFO Utils: Successfully started service 'Connection manager for block manager' on port 32973.
15/03/30 01:17:54 INFO ConnectionManager: Bound socket to port 32973 with id = ConnectionManagerId(localhost,32973)
15/03/30 01:17:54 INFO MemoryStore: MemoryStore started with capacity 265.1 MB
15/03/30 01:17:54 INFO BlockManagerMaster: Trying to register BlockManager
15/03/30 01:17:54 INFO BlockManagerMasterActor: Registering block manager localhost:32973 with 265.1 MB RAM
15/03/30 01:17:54 INFO BlockManagerMaster: Registered BlockManager
15/03/30 01:17:54 INFO HttpFileServer: HTTP File server directory is /tmp/spark-630cc34e-cc29-4815-b51f-8345250cb030
15/03/30 01:17:54 INFO HttpServer: Starting HTTP Server
15/03/30 01:17:54 INFO Utils: Successfully started service 'HTTP file server' on port 43669.
15/03/30 01:17:54 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/03/30 01:17:54 INFO SparkUI: Started SparkUI at http://localhost:4040
15/03/30 01:17:54 INFO SparkContext: Added JAR file:/usr/local/src/spark/spark-1.1.0/spark-cassandra-connector/spark-cassandra-connector/target/scala-2.10/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar at http://192.168.1.10:43669/jars/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar with timestamp 1427671074181
15/03/30 01:17:54 INFO AkkaUtils: Connecting to HeartbeatReceiver: akka.tcp://sparkDriver@localhost:35080/user/HeartbeatReceiver
sc: org.apache.spark.SparkContext = org.apache.spark.SparkContext@118a4d5
4。使用cassandraTable
函数构建CassandraTableScanRDD
类
scala> val table = sc.cassandraTable("twissandra", "invoices")
table: com.datastax.spark.connector.rdd.CassandraTableScanRDD[com.datastax.spark.connector.CassandraRow] = CassandraTableScanRDD[0] at RDD at CassandraRDD.scala:15
5。在count
类
CassandraTableScanRDD
函数
scala> table.count
15/03/30 01:39:43 INFO Cluster: New Cassandra host /127.0.0.1:9042 added
15/03/30 01:39:43 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
15/03/30 01:39:43 INFO SparkContext: Starting job: reduce at CassandraTableScanRDD.scala:243
15/03/30 01:39:43 INFO DAGScheduler: Got job 0 (reduce at CassandraTableScanRDD.scala:243) with 1 output partitions (allowLocal=false)
15/03/30 01:39:43 INFO DAGScheduler: Final stage: Stage 0(reduce at CassandraTableScanRDD.scala:243)
15/03/30 01:39:43 INFO DAGScheduler: Parents of final stage: List()
15/03/30 01:39:43 INFO DAGScheduler: Missing parents: List()
15/03/30 01:39:43 INFO DAGScheduler: Submitting Stage 0 (CassandraTableScanRDD[1] at RDD at CassandraRDD.scala:15), which has no missing parents
15/03/30 01:39:43 INFO CassandraConnector: Disconnected from Cassandra cluster: Test Cluster
15/03/30 01:39:43 INFO MemoryStore: ensureFreeSpace(5320) called with curMem=0, maxMem=278019440
15/03/30 01:39:43 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 5.2 KB, free 265.1 MB)
15/03/30 01:39:43 INFO DAGScheduler: Submitting 1 missing tasks from Stage 0 (CassandraTableScanRDD[1] at RDD at CassandraRDD.scala:15)
15/03/30 01:39:43 INFO TaskSchedulerImpl: Adding task set 0.0 with 1 tasks
15/03/30 01:39:43 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, NODE_LOCAL, 26342 bytes)
15/03/30 01:39:43 INFO Executor: Running task 0.0 in stage 0.0 (TID 0)
15/03/30 01:39:43 INFO Executor: Fetching http://192.168.1.10:41700/jars/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar with timestamp 1427672382104
15/03/30 01:39:43 INFO Utils: Fetching http://192.168.1.10:41700/jars/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar to /tmp/fetchFileTemp97270090697167118.tmp
15/03/30 01:39:44 INFO Executor: Adding file:/tmp/spark-0a658f91-717f-4c30-8fe2-979c8c1399a7/spark-cassandra-connector-assembly-1.3.0-SNAPSHOT.jar to class loader
15/03/30 01:39:44 INFO Cluster: New Cassandra host /127.0.0.1:9042 added
15/03/30 01:39:44 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
15/03/30 01:39:44 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.NoSuchMethodError: org.apache.spark.SparkEnv.isStopped()Z
at org.apache.spark.metrics.CassandraConnectorSource$.instance(CassandraConnectorSource.scala:53)
at com.datastax.spark.connector.metrics.InputMetricsUpdater$.apply(InputMetricsUpdater.scala:53)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.compute(CassandraTableScanRDD.scala:194)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
15/03/30 01:39:44 ERROR ExecutorUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main]
java.lang.NoSuchMethodError: org.apache.spark.SparkEnv.isStopped()Z
at org.apache.spark.metrics.CassandraConnectorSource$.instance(CassandraConnectorSource.scala:53)
at com.datastax.spark.connector.metrics.InputMetricsUpdater$.apply(InputMetricsUpdater.scala:53)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.compute(CassandraTableScanRDD.scala:194)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
15/03/30 01:39:44 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.NoSuchMethodError: org.apache.spark.SparkEnv.isStopped()Z
org.apache.spark.metrics.CassandraConnectorSource$.instance(CassandraConnectorSource.scala:53)
com.datastax.spark.connector.metrics.InputMetricsUpdater$.apply(InputMetricsUpdater.scala:53)
com.datastax.spark.connector.rdd.CassandraTableScanRDD.compute(CassandraTableScanRDD.scala:194)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
15/03/30 01:39:44 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; aborting job
15/03/30 01:39:44 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
15/03/30 01:39:44 INFO TaskSchedulerImpl: Cancelling stage 0
答案 0 :(得分:4)
java.lang.NoSuchMethodError
是版本不匹配的常见指示:您的一个依赖项是针对另一个依赖项的更新版本编译的,并且在运行时提供了没有该新方法的早期版本。
在这种情况下,您尝试针对Spark-Cassandra Connector 1.3.0-SNAPSHOT
运行Spark 1.1.0
。尝试对齐这些版本。要么使用1.3.0版本的Spark与1.1.0兼容版本的spark-cassandra连接器。
答案 1 :(得分:0)
我花了很多时间在类似的错误中。这确实是一个版本不匹配。 我找到了一个版本兼容性表,可以帮助其他人here。