从Spark Java App提交到我机器上托管的Spark Cluster,我正在尝试连接到我的机器上托管的Cassandra数据库@ 127.0.0.1:9042,而我的Spring Boot应用程序无法启动。
方法1 -
**基于Spark-Cassandra-Connector link我在POM文件中包含了以下内容 - **
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>2.0.0-M3</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.0</version>
</dependency>
方法1 - NoSuchMethodError - 日志文件:
16/09/08 15:12:50 ERROR SpringApplication: Application startup failed
java.lang.NoSuchMethodError: com.datastax.driver.core.KeyspaceMetadata.getMaterializedViews()Ljava/util/Collection;
at com.datastax.spark.connector.cql.Schema$.com$datastax$spark$connector$cql$Schema$$fetchTables$1(Schema.scala:281)
at com.datastax.spark.connector.cql.Schema$$anonfun$com$datastax$spark$connector$cql$Schema$$fetchKeyspaces$1$2.apply(Schema.scala:305)
at com.datastax.spark.connector.cql.Schema$$anonfun$com$datastax$spark$connector$cql$Schema$$fetchKeyspaces$1$2.apply(Schema.scala:304)
at scala.collection.TraversableLike$WithFilter$$anonfun$map$2.apply(TraversableLike.scala:683)
at scala.collection.immutable.HashSet$HashSet1.foreach(HashSet.scala:316)
at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972)
at scala.collection.immutable.HashSet$HashTrieSet.foreach(HashSet.scala:972)
at scala.collection.TraversableLike$WithFilter.map(TraversableLike.scala:682)
at com.datastax.spark.connector.cql.Schema$.com$datastax$spark$connector$cql$Schema$$fetchKeyspaces$1(Schema.scala:304)
at com.datastax.spark.connector.cql.Schema$$anonfun$fromCassandra$1.apply(Schema.scala:325)
at com.datastax.spark.connector.cql.Schema$$anonfun$fromCassandra$1.apply(Schema.scala:322)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withClusterDo$1.apply(CassandraConnector.scala:122)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withClusterDo$1.apply(CassandraConnector.scala:121)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:111)
at com.datastax.spark.connector.cql.CassandraConnector$$anonfun$withSessionDo$1.apply(CassandraConnector.scala:110)
at com.datastax.spark.connector.cql.CassandraConnector.closeResourceAfterUse(CassandraConnector.scala:140)
at com.datastax.spark.connector.cql.CassandraConnector.withSessionDo(CassandraConnector.scala:110)
at com.datastax.spark.connector.cql.CassandraConnector.withClusterDo(CassandraConnector.scala:121)
at com.datastax.spark.connector.cql.Schema$.fromCassandra(Schema.scala:322)
at com.datastax.spark.connector.cql.Schema$.tableFromCassandra(Schema.scala:342)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.tableDef(CassandraTableRowReaderProvider.scala:50)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef$lzycompute(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.tableDef(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableRowReaderProvider$class.verify(CassandraTableRowReaderProvider.scala:137)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.verify(CassandraTableScanRDD.scala:60)
at com.datastax.spark.connector.rdd.CassandraTableScanRDD.getPartitions(CassandraTableScanRDD.scala:232)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:248)
at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:246)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.rdd.RDD.partitions(RDD.scala:246)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1911)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:875)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:873)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:873)
at org.apache.spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:350)
at org.apache.spark.api.java.AbstractJavaRDDLike.foreach(JavaRDDLike.scala:45)
at com.initech.myapp.cassandra.service.CassandraDataService.getMatches(CassandraDataService.java:45)
at com.initech.myapp.processunit.MySparkApp.receive(MySparkApp.java:120)
at com.initech.myapp.processunit.MySparkApp.process(MySparkApp.java:61)
at com.initech.myapp.processunit.MySparkApp.run(MySparkApp.java:144)
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:789)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:779)
at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:769)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:314)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1185)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1174)
at com.initech.myapp.MySparkAppBootApp.main(MyAppProcessingUnitsApplication.java:20)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
16/09/08 15:12:50 INFO AnnotationConfigApplicationContext: Closing org.springframework.context.annotation.AnnotationConfigApplicationContext@3381b4fc: startup date [Thu Sep 08 15:12:40 PDT 2016]; root of context hierarchy
方法2 -
**由于我正在开发的是一个Java Spark应用程序,我想到使用Spark-Cassandra-Connector-Java并将下面的内容包含在POM文件中 - **
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector_2.11</artifactId>
<version>2.0.0-M3</version>
</dependency>
<dependency>
<groupId>com.datastax.spark</groupId>
<artifactId>spark-cassandra-connector-java_2.11</artifactId>
<version>1.2.6</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-streaming_2.11</artifactId>
<version>2.0.0</version>
</dependency>
最终得到了这个
方法2 - SelectableColumnRef NoClassDefFoundError - 日志文件:
16/09/08 16:28:07错误SpringApplication:应用程序启动失败 java.lang.NoClassDefFoundError:com / datastax / spark / connector / SelectableColumnRef 在com.initech.myApp.cassandra.service.CassandraDataService.getMatches(CassandraDataService.java:41)
**我的Spark Main方法调用**
下面的process()方法 public boolean process() throws InterruptedException {
logger.debug("In the process() method");
SparkConf sparkConf = new SparkConf().setAppName("My Process Unit");
sparkConf.set("spark.cassandra.connection.host", "127.0.0.1");
sparkConf.set("spark.cassandra.connection.port","9042");
logger.debug("SparkConf = " + sparkConf);
JavaStreamingContext javaStreamingContext = new JavaStreamingContext(sparkConf, new Duration(1000));
logger.debug("JavaStreamingContext = " + javaStreamingContext);
JavaSparkContext javaSparkContext = javaStreamingContext.sparkContext();
logger.debug("Java Spark context = " + javaSparkContext);
JavaRDD<MyData> myDataJavaRDD = receive(javaSparkContext);
myDataJavaRDD.foreach(myData -> {
logger.debug("myData = " + myData);
});
javaStreamingContext.start();
javaStreamingContext.awaitTermination();
return true; }
**调用下面的receive()**
private JavaRDD<MyData> receive(JavaSparkContext javaSparkContext) {
logger.debug("receive method called...");
List<String> myAppConfigsStrings = myAppConfiguration.get();
logger.debug("Received ..." + myAppConfigsStrings);
for(String myAppConfigStr : myAppConfigsStrings)
{
ObjectMapper mapper = new ObjectMapper();
MyAppConfig myAppConfig;
try {
logger.debug("Parsing the myAppConfigStr..." + myAppConfigStr);
myAppConfig = mapper.readValue(myAppConfigStr, MyAppConfig.class);
logger.debug("Parse Complete...");
// Check for matching data in Cassandra
JavaRDD<MyData> cassandraRowsRDD = cassandraDataService.getMatches(myAppConfig, javaSparkContext);
cassandraRowsRDD.foreach(myData -> {
logger.debug("myData = " + myData);
});
return cassandraRowsRDD;
} catch (IOException e) {
e.printStackTrace();
}
}
return null;
}
**最终调用下面的Cassandra数据服务getMatches()**
@Service
public class CassandraDataService implements Serializable {
private static final Log logger = LogFactory.getLog(CassandraDataService.class);
public JavaRDD<MyData> getMatches(MyAppConfig myAppConfig, JavaSparkContext javaSparkContext) {
logger.debug("Creating the MyDataID...");
MyDataID myDataID = new MyDataID();
myDataID.set...(myAppConfig.get...);
myDataID.set...(myAppConfig.get...);
myDataID.set...(myAppConfig.get...);
logger.debug("MyDataID = " + myDataID);
JavaRDD<MyData> cassandraRowsRDD = javaFunctions(javaSparkContext).cassandraTable("myKeySpace", "myData", mapRowTo(MyData.class));
cassandraRowsRDD.foreach(myData -> {
logger.debug("====== Cassandra Data Service ========");
logger.debug("myData = " + myData);
logger.debug("====== Cassandra Data Service ========");
});
return cassandraRowsRDD;
}
}
有没有人遇到类似的错误或者可以向我提供某个方向? 我已经尝试使用谷歌搜索和阅读几个项目 - 但没有人可以拯救。感谢。
更新9/9/2016太平洋标准时间下午2:15
我尝试了上面的方法。这就是我所做的 -
使用下面的spark-submit命令使用Spring Boot Uber Jar提交我的Spark应用程序 -
./bin/spark-submit --class org.springframework.boot.loader.JarLauncher --master spark://localhost:6066 --deploy-mode cluster /Users/apple/Repos/Initech/Officespace/target/my-spring-spark-boot-streaming-app-0.1-SNAPSHOT.jar
Spark Driver程序成功启动并启动了我的Spark App,并设置为“WAITING”状态,因为只有一个正在运行的工作程序已分配给驱动程序
如果它无论如何都有用 - 他就是我正在使用的堆栈
1. cqlsh 5.0.1 | Cassandra 2.2.7 | CQL spec 3.3.1
2. Spark - 2.0.0
3. Spring Boot - 1.4.0.RELEASE
4. Jar's listed in the Approach 1 above
Exception Stack Tracke
16/09/09 14:13:24 ERROR SpringApplication: Application startup failed
java.lang.IllegalStateException: Failed to execute ApplicationRunner
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:792)
at org.springframework.boot.SpringApplication.callRunners(SpringApplication.java:779)
at org.springframework.boot.SpringApplication.afterRefresh(SpringApplication.java:769)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:314)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1185)
at org.springframework.boot.SpringApplication.run(SpringApplication.java:1174)
at com.initech.officespace.MySpringBootSparkApp.main(MySpringBootSparkApp.java:23)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.boot.loader.MainMethodRunner.run(MainMethodRunner.java:48)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:87)
at org.springframework.boot.loader.Launcher.launch(Launcher.java:50)
at org.springframework.boot.loader.JarLauncher.main(JarLauncher.java:58)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.worker.DriverWrapper$.main(DriverWrapper.scala:58)
at org.apache.spark.deploy.worker.DriverWrapper.main(DriverWrapper.scala)
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 192.168.0.30): java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Driver stacktrace:
at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1450)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1438)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1437)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1437)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:811)
at scala.Option.foreach(Option.scala:257)
at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:811)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1659)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1618)
at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1607)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:632)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1871)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1884)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1897)
at org.apache.spark.SparkContext.runJob(SparkContext.scala:1911)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:875)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:873)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:358)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:873)
at org.apache.spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:350)
at org.apache.spark.api.java.AbstractJavaRDDLike.foreach(JavaRDDLike.scala:45)
at com.initech.officespace.cassandra.service.CassandraDataService.getMatches(CassandraDataService.java:43)
at com.initech.officespace.processunit.MyApp.receive(MyApp.java:120)
at com.initech.officespace.processunit.MyApp.process(MyApp.java:61)
at com.initech.officespace.processunit.MyApp.run(MyApp.java:144)
at org.springframework.boot.SpringApplication.callRunner(SpringApplication.java:789)
... 20 more
Caused by: java.lang.ClassNotFoundException: com.datastax.spark.connector.rdd.partitioner.CassandraPartition
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:348)
at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:67)
at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1521)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1781)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)
at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)
at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)
at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)
at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)
at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:253)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
16/09/09 14:13:24 INFO AnnotationConfigApplicationContext: Closing org.springframework.context.annotation.AnnotationConfigApplicationContext@3381b4fc: startup date [Fri Sep 09 14:10:40 PDT 2016]; root of context hierarchy
在太平洋标准时间9/9/2016下午3:20更新2
现在根据RussS @ Issues with datastax spark-cassandra connector
提供的答案解决问题将我的spark-submit更新到下面后,我发现工作人员能够拾取连接器并开始处理RDD :)
./bin/spark-submit --class org.springframework.boot.loader.JarLauncher --master spark://localhost:6066 --deploy-mode cluster --packages com.datastax.spark:spark-cassandra-connector_2.11:2.0.0-M3 /Users/apple/Repos/Initech/Officespace/target/my-spring-spark-boot-streaming-app-0.1-SNAPSHOT.jar
答案 0 :(得分:0)
解决方案可能会有所不同。
当尝试从Java上的PC(驱动程序)用cassandra运行spark时,我遇到了此异常。
在我的情况下,您可以将带有spark-cassandra-connector的jar添加到SparkContext中,如下例所示:
JavaSparkContext sc = new JavaSparkContext(conf);
sc.addJar("./build/libs/spark-cassandra-connector_2.11-2.4.2.jar"); // location of driver could be different.
答案 1 :(得分:-1)
com.datastax.driver.core.KeyspaceMetadata.getMaterializedViews
出现在驱动程序的3.0版本中。
尝试将此依赖项添加到版本1:
<dependency>
<groupId>com.datastax.cassandra</groupId>
<artifactId>cassandra-driver-core</artifactId>
<version>3.1.0</version>
</dependency>