我有3个json文件存储在mongoDB中,我想操纵它们来获取特定的数据帧。
val readConfigUser = ReadConfig(Map("uri" -> "mongodb://<IP>:<port>/db.collection1"))
val userDF = MongoSpark
.load(sc,readConfigUser)
.toDF()
.select($"user_id", $"review_count", $"friends", $"fans")
val readConfigBusiness = ReadConfig(Map("uri" -> "mongodb://<IP>:<port>/yelpdb.businessCollection"))
val businessDF = MongoSpark
.load(sc,readConfigBusiness)
.toDF()
.select($"business_id", $"categories", $"review_count", $"stars")
val readConfigReview = ReadConfig(Map("uri" -> "mongodb://<IP>:<port>/yelpdb.reviewsCollection"))
val reviewDF = MongoSpark
.load(sc,readConfigReview)
.toDF()
.select($"review_id", $"user_id", $"cool", $"stars", $"business_id", $"useful", $"funny")
经过多次操作后,我想在数据框中找到执行此代码片段的最大值:
val max_influence = final_tempDF.agg(max("influence") as ("max_influence")).first.getAs[Double](0)
val finalDF = final_tempDF
.select($"user_id", $"business_id", $"categories_reviewed", $"list_users_with_same_reviews_business", ($"influence"/max_influence) as ("normalized_influence"))
但是在这一点上执行失败了:
org.apache.spark.SparkException: Job aborted due to stage failure: Task 2 in stage 11.0 failed 1 times, most recent failure: Lost task 2.0 in stage 11.0 (TID 32, localhost, executor driver): com.mongodb.MongoCursorNotFoundException: Query failed with error code -5 and error message 'Cursor 50354548740 not found on server <IP>:<port>' on server <IP>:<port>
这在mongoDB日志文件中报告:
2017-09-22T15:54:08.565+0200 I COMMAND [conn36] killcursors: found 0 of 1
2017-09-22T15:54:09.184+0200 I - [conn36] end connection 35.167.27.25:55218 (14 connections now open)
2017-09-22T15:56:35.642+0200 I - [conn44] end connection 35.167.27.25:34996 (13 connections now open)
2017-09-22T15:56:35.642+0200 I - [conn47] end connection 35.167.27.25:35438 (12 connections now open)
2017-09-22T15:56:35.642+0200 I - [conn48] end connection 35.167.27.25:35604 (11 connections now open)
2017-09-22T15:56:35.643+0200 I - [conn46] end connection 35.167.27.25:35418 (10 connections now open)
2017-09-22T15:56:35.643+0200 I - [conn45] end connection 35.167.27.25:35012 (9 connections now open)
有什么问题?我该如何解决?我正在使用databricks社区版(我是学生),mongoDB版本是3.4.9。