我正在尝试使用spark-submit在Spark集群上提交.jar应用程序。使用的二进制文件是 火花2.1.1彬hadoop2.7
提交时出现的问题是申请失败,但例外情况为:
$:/path-to-spark-binary/spark-submit --class Main --master local[*] app.jar
Exception in thread "main" java.lang.NoClassDefFoundError: org/bson/codecs/IterableCodecProvider
at com.mongodb.MongoClient.<clinit>(MongoClient.java:85)
at com.mongodb.MongoClientOptions$Builder.<init>(MongoClientOptions.java:829)
at com.mongodb.MongoClientURI.<init>(MongoClientURI.java:183)
应用程序使用MongoClient对象与远程MongoDB数据库连接。创建新的MongoClient对象时,应用程序将失败并显示上一个错误。
我正在使用Maven来创建包含所有依赖项的超级jar。我使用的是mongo-java-driver.jar库from here
我还应该提一下,在远程服务器上没有Spark的情况下执行时,应用程序正常运行:
java -jar app.jar // ok
在使用相同的Spark二进制文件在我的个人计算机上本地提交时,它也能正常运行
我发现在创建新的IterableCodecProvider对象时应用程序不会失败,但只有在创建新的MongoClient对象时才会失败。这是静态初始化问题吗?没有上述库,是否还有其他方法可以与MongoDB数据库连接?
编辑1:生成异常的代码:
try (SparkSession spark = SparkSession
.builder()
.config("spark.master", "local")
.config("spark.mongodb.input.uri", Constants.sparkMongoDatabaseURI)
.config("spark.mongodb.output.uri", Constants.sparkMongoDatabaseURI)
.appName("Spark Session 1").getOrCreate()) {
JavaSparkContext jsc = new JavaSparkContext(spark.sparkContext());
MongoClientURI mongoClientURI = new MongoClientURI(Constants.dbMongoURI);
MongoClient mongoClient = new MongoClient(mongoClientURI);
...
}
编辑2:我也尝试过使用spark-shell,但这些相同的命令会产生完全相同的异常:
$:/path-to-spark-binary/spark-shell --packages org.mongodb.spark:mongo-spark-connector_2.01:2.0.0
import org.apache.spark.{SparkContext, SparkConf}
import com.mongodb.spark.MongoSpark;
sc.stop()
val sc2 = new SparkContext(new SparkConf().setAppName("shell").set("spark.mongodb.input.uri", "mongodb://my_host:my_port/my_database.my_collection"))
MongoSpark.load(sc2)
编辑3:类路径
我在运行时使用代码打印了类路径:
ClassLoader cl = ClassLoader.getSystemClassLoader();
URL[] urls = ((URLClassLoader)cl).getURLs();
for(URL url: urls){
System.out.println(url.getFile());
}
使用“mongo”作为关键字对结果应用grep后,会显示以下.jars:
$SPARK_HOME/jars/spark-mongodb_2.11-0.12.0.jar
$SPARK_HOME/jars/mongodb-driver-3.4.2.jar
$SPARK_HOME/jars/mongo-hadoop-spark-2.0.2.jar
pom.xml包含以下依赖项:
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-core_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-sql_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<dependency>
<groupId>org.mongodb.spark</groupId>
<artifactId>mongo-spark-connector_2.11</artifactId>
<version>2.0.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.spark/spark-mllib_2.11 -->
<dependency>
<groupId>org.apache.spark</groupId>
<artifactId>spark-mllib_2.11</artifactId>
<version>2.1.1</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.apache.kafka/kafka-clients -->
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka-clients</artifactId>
<version>0.10.2.1</version>
</dependency>
<dependency>
<groupId>com.google.code.gson</groupId>
<artifactId>gson</artifactId>
<version>2.8.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/com.cloudera.sparkts/sparkts -->
<dependency>
<groupId>com.cloudera.sparkts</groupId>
<artifactId>sparkts</artifactId>
<version>0.4.0</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.drools/drools-compiler -->
<dependency>
<groupId>org.drools</groupId>
<artifactId>drools-compiler</artifactId>
<version>6.5.0.Final</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.drools/drools-compiler -->
<dependency>
<groupId>org.drools</groupId>
<artifactId>drools-compiler</artifactId>
<version>6.5.0.Final</version>
</dependency>
<!-- https://mvnrepository.com/artifact/org.mongodb/mongo-java-driver -->
<dependency>
<groupId>org.mongodb</groupId>
<artifactId>mongo-java-driver</artifactId>
<version>3.4.2</version>
</dependency>