如何使用spark从zepplin中的mongodb读取数据?

时间:2018-04-25 16:04:38

标签: mongodb scala apache-spark apache-zeppelin hortonworks-sandbox

我在hdp 2.6中使用zeppelin我想使用spark2解释器从mongodb读取集合

util.Properties.versionString
spark.version
res22: String = version 2.11.8
res23: String = 2.2.0.2.6.4.0-91

我尝试使用MongoDB 3.4.14 mongo-spark-connector 2.2.2 mongo-java-driver 3.5.0

val customReadConfig = ReadConfig(Map("readPreference.name" -> "secondaryPreferred" ,"uri" -> "mongodb://127.0.0.1:27017/test.collections"))
val df5 = spark.sparkSession.read.mongo(customReadConfig)

我收到此错误

     customReadConfig: com.mongodb.spark.config.ReadConfig.Self =ReadConfig(test,collections,Some(mongodb://127.0.0.1:27017/test.collections),1000,DefaultMongoPartitioner,Map(),15,ReadPreferenceConfig(secondaryPreferred,None),ReadConcernConfig(None),false)
     org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 20.0 failed 1 times, most recent failure: Lost task 0.0 in stage 20.0 (TID 20, localhost, executor driver): java.lang.NoClassDefFoundError: scala/collection/GenTraversableOnce$class
at com.mongodb.spark.rdd.MongoRDD$MongoCursorIterator.<init>(MongoRDD.scala:174)
at com.mongodb.spark.rdd.MongoRDD.compute(MongoRDD.scala:152)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)

0 个答案:

没有答案