Scala-Spark从mongoDB读取和操作数据:找不到游标异常

时间:2017-09-22 14:30:40

标签: mongodb scala apache-spark databricks

我有3个json文件存储在mongoDB中,我想操纵它们来获取特定的数据帧。

val readConfigUser = ReadConfig(Map("uri" -> "mongodb://<IP>:<port>/db.collection1"))
val userDF = MongoSpark
.load(sc,readConfigUser)
.toDF()
.select($"user_id", $"review_count", $"friends", $"fans")

val readConfigBusiness = ReadConfig(Map("uri" -> "mongodb://<IP>:<port>/yelpdb.businessCollection"))
val businessDF = MongoSpark
.load(sc,readConfigBusiness)
.toDF()
.select($"business_id", $"categories", $"review_count", $"stars")

val readConfigReview = ReadConfig(Map("uri" -> "mongodb://<IP>:<port>/yelpdb.reviewsCollection"))
val reviewDF = MongoSpark
.load(sc,readConfigReview)
.toDF()
.select($"review_id", $"user_id", $"cool", $"stars", $"business_id",     $"useful", $"funny")

经过多次操作后,我想在数据框中找到执行此代码片段的最大值:

val max_influence = final_tempDF.agg(max("influence") as ("max_influence")).first.getAs[Double](0)

val finalDF = final_tempDF
.select($"user_id", $"business_id", $"categories_reviewed",   $"list_users_with_same_reviews_business", ($"influence"/max_influence) as ("normalized_influence"))

但是在这一点上执行失败了:

org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 11.0 failed 1 times, most recent failure: Lost task 2.0 in stage 11.0 (TID 32, localhost, executor driver): com.mongodb.MongoCursorNotFoundException: Query failed with error code -5 and error message 'Cursor 50354548740 not found on server <IP>:<port>' on server <IP>:<port>

这在mongoDB日志文件中报告:

2017-09-22T15:54:08.565+0200 I COMMAND  [conn36] killcursors: found 0 of 1
2017-09-22T15:54:09.184+0200 I -        [conn36] end connection 35.167.27.25:55218 (14 connections now open)
2017-09-22T15:56:35.642+0200 I -        [conn44] end connection 35.167.27.25:34996 (13 connections now open)
2017-09-22T15:56:35.642+0200 I -        [conn47] end connection 35.167.27.25:35438 (12 connections now open)
2017-09-22T15:56:35.642+0200 I -        [conn48] end connection 35.167.27.25:35604 (11 connections now open)
2017-09-22T15:56:35.643+0200 I -        [conn46] end connection 35.167.27.25:35418 (10 connections now open)
2017-09-22T15:56:35.643+0200 I -        [conn45] end connection 35.167.27.25:35012 (9 connections now open)

有什么问题?我该如何解决?我正在使用databricks社区版(我是学生),mongoDB版本是3.4.9。

0 个答案:

没有答案