我正在尝试使用Spark Cassandra Connector的一个简单示例。
我正在使用cassandra 2.0.9和spark 1.1.0
当我对基于CassandraJavaRDD构建的JavaRDD执行SQL查询时,我收到以下错误
06 Jan 2015 17:01:28,077 DEBUG Cluster : Cannot connect with protocol V3, trying V2
06 Jan 2015 17:01:28,075 DEBUG Connection : Connection[/127.0.0.1:9042-1, inFlight=0, closed=true] closing connection
06 Jan 2015 17:01:28,077 DEBUG Connection : Connection[/127.0.0.1:9042-1, inFlight=0, closed=true] has already terminated
06 Jan 2015 17:01:28,080 DEBUG Connection : Connection[/127.0.0.1:9042-2, inFlight=0, closed=false] Transport initialized and ready
06 Jan 2015 17:01:28,080 DEBUG ControlConnection: [Control connection] Refreshing node list and token map
06 Jan 2015 17:01:28,090 DEBUG ControlConnection: [Control connection] Refreshing schema
06 Jan 2015 17:01:28,142 DEBUG ControlConnection: [Control connection] Refreshing node list and token map
06 Jan 2015 17:01:28,156 DEBUG ControlConnection: [Control connection] Successfully connected to /127.0.0.1:9042
06 Jan 2015 17:01:28,156 INFO Cluster : New Cassandra host /127.0.0.1:9042 added
06 Jan 2015 17:01:28,156 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
06 Jan 2015 17:01:28,156 INFO LocalNodeFirstLoadBalancingPolicy: Adding host 127.0.0.1 (datacenter1)
06 Jan 2015 17:01:28,167 DEBUG Connection : Connection[/127.0.0.1:9042-3, inFlight=0, closed=false] Transport initialized and ready
06 Jan 2015 17:01:28,176 DEBUG Connection : Connection[/127.0.0.1:9042-4, inFlight=0, closed=false] Transport initialized and ready
06 Jan 2015 17:01:28,176 DEBUG Session : Added connection pool for /127.0.0.1:9042
06 Jan 2015 17:01:28,177 INFO LocalNodeFirstLoadBalancingPolicy: Adding host 127.0.0.1 (datacenter1)
06 Jan 2015 17:01:28,194 DEBUG CassandraRDD : Fetching data for range token("m_id") > ? AND token("m_id") <= ? with SELECT "m_id", "m_name" FROM "my_keyspace"."m_table" WHERE token("m_id") > ? AND token("m_id") <= ? ALLOW FILTERING with params [-3710785879179969863,-3308243544180364096]
06 Jan 2015 17:01:28,624 DEBUG CassandraRDD : Row iterator for range token("m_id") > ? AND token("m_id") <= ? obtained successfully.
06 Jan 2015 17:01:28,633 DEBUG CassandraRDD : Fetched 1 rows from my_keyspace.m_table for partition 0 in 0.455 s.
06 Jan 2015 17:01:28,634 ERROR Executor : Exception in task 0.0 in stage 0.0 (TID 0)
java.lang.IllegalArgumentException: object is not an instance of declaring class
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:601)
at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(JavaSQLContext.scala:100)
at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(JavaSQLContext.scala:100)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$1$$anonfun$apply$1.apply(JavaSQLContext.scala:100)
at org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$1$$anonfun$apply$1.apply(JavaSQLContext.scala:99)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
at scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
at org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1165)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
at org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
at org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
at org.apache.spark.scheduler.Task.run(Task.scala:54)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
06 Jan 2015 17:01:28,640 DEBUG LocalActor : [actor] received message StatusUpdate(0,FAILED,java.nio.HeapByteBuffer[pos=0 lim=2723 cap=2723]) from Actor[akka://sparkDriver/deadLetters]
06 Jan 2015 17:01:28,641 DEBUG TaskSchedulerImpl: parentName: , name: TaskSet_0, runningTasks: 0
06 Jan 2015 17:01:28,644 INFO TaskSetManager : Starting task 1.0 in stage 0.0 (TID 1, localhost, ANY, 6186 bytes)
06 Jan 2015 17:01:28,645 INFO Executor : Running task 1.0 in stage 0.0 (TID 1)
06 Jan 2015 17:01:28,645 DEBUG LocalActor : [actor] handled message (4.569476 ms) StatusUpdate(0,FAILED,java.nio.HeapByteBuffer[pos=2723 lim=2723 cap=2723]) from Actor[akka://sparkDriver/deadLetters]
06 Jan 2015 17:01:28,645 DEBUG LocalActor : [actor] received message StatusUpdate(1,RUNNING,java.nio.HeapByteBuffer[pos=0 lim=0 cap=0]) from Actor[akka://sparkDriver/deadLetters]
06 Jan 2015 17:01:28,647 WARN TaskSetManager : Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: object is not an instance of declaring class
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:601)
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(JavaSQLContext.scala:100)
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(JavaSQLContext.scala:100)
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$1$$anonfun$apply$1.apply(JavaSQLContext.scala:100)
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$1$$anonfun$apply$1.apply(JavaSQLContext.scala:99)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1165)
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
06 Jan 2015 17:01:28,649 ERROR TaskSetManager : Task 0 in stage 0.0 failed 1 times; aborting job
06 Jan 2015 17:01:28,650 DEBUG BlockManager : Getting local block broadcast_0
06 Jan 2015 17:01:28,650 DEBUG BlockManager : Level for block broadcast_0 is StorageLevel(true, true, false, true, 1)
06 Jan 2015 17:01:28,650 DEBUG BlockManager : Getting block broadcast_0 from memory
06 Jan 2015 17:01:28,650 DEBUG LocalActor : [actor] handled message (5.114644 ms) StatusUpdate(1,RUNNING,java.nio.HeapByteBuffer[pos=0 lim=0 cap=0]) from Actor[akka://sparkDriver/deadLetters]
06 Jan 2015 17:01:28,650 DEBUG Executor : Task 1's epoch is 0
06 Jan 2015 17:01:28,655 INFO TaskSchedulerImpl: Cancelling stage 0
06 Jan 2015 17:01:28,658 DEBUG LocalActor : [actor] received message KillTask(1,false) from Actor[akka://sparkDriver/deadLetters]
06 Jan 2015 17:01:28,659 INFO Executor : Executor is trying to kill task 1.0 in stage 0.0 (TID 1)
06 Jan 2015 17:01:28,659 INFO TaskSchedulerImpl: Stage 0 was cancelled
06 Jan 2015 17:01:28,659 DEBUG LocalActor : [actor] handled message (0.860446 ms) KillTask(1,false) from Actor[akka://sparkDriver/deadLetters]
06 Jan 2015 17:01:28,661 INFO DAGScheduler : Failed to run count at JavaSchemaRDD.scala:42
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0, localhost): java.lang.IllegalArgumentException: object is not an instance of declaring class
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
java.lang.reflect.Method.invoke(Method.java:601)
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(JavaSQLContext.scala:100)
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$1$$anonfun$apply$1$$anonfun$apply$2.apply(JavaSQLContext.scala:100)
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$1$$anonfun$apply$1.apply(JavaSQLContext.scala:100)
org.apache.spark.sql.api.java.JavaSQLContext$$anonfun$1$$anonfun$apply$1.apply(JavaSQLContext.scala:99)
scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
scala.collection.Iterator$$anon$14.hasNext(Iterator.scala:389)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
org.apache.spark.util.Utils$.getIteratorSize(Utils.scala:1165)
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
org.apache.spark.rdd.RDD$$anonfun$count$1.apply(RDD.scala:904)
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62)
org.apache.spark.scheduler.Task.run(Task.scala:54)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
java.lang.Thread.run(Thread.java:722)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1185)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1174)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1173)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1173)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:688)
at scala.Option.foreach(Option.scala:236)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:688)
at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAGScheduler.scala:1391)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
at akka.actor.ActorCell.invoke(ActorCell.scala:456)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
at akka.dispatch.Mailbox.run(Mailbox.scala:219)
at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
06 Jan 2015 17:01:28,667 DEBUG DAGScheduler : Removing running stage 0
这是我的代码
SparkConf conf = new SparkConf();
conf.setAppName("Spark Cassandra demo");
conf.setMaster("local");
conf.set("spark.cassandra.connection.host", "127.0.0.1");
SparkContextJavaFunctions javaFunctions = CassandraJavaUtil.javaFunctions(sc);
logger.debug("javaFunctions=["+javaFunctions+"]");
CassandraJavaRDD<CassandraRow> mCassandraRDD = javaFunctions.cassandraTable("my_keyspace", "m_table");
logger.debug("mCassandraRDD =["+mCassandraRDD +"]");
mCassandraRDD .map(new Function<CassandraRow, MObject>() {
@Override
public Matter call(CassandraRow row) throws Exception {
MObject mObject= new MObject();
mObject.setId(row.getString("M_ID"));
mObject.setName(row.getString("m_name"));
return mObject;
}
});
JavaSQLContext sqlCtx = new JavaSQLContext(sc);
JavaSchemaRDD schemaMObject = sqlCtx.applySchema(mCassandraRDD , MObject.class);
schemaMatter.registerTempTable("MOBJECT_SPARK");
JavaSchemaRDD johnRDD = sqlCtx.sql("SELECT * FROM MOBJECT_SPARK WHERE name='john'");
System.out.println("Count=["+johnRDD.count()+"]");
不确定此代码有什么问题。
感谢任何输入。
由于