我正在编写一个自定义的Spark Streaming Mongo Receiver,以便使用spark Dstream从mongoDb集合中读取数据。
下面是我写的代码:
class MongoDBReceiver[D: ClassTag](mongoConnector: MongoDefaultConnector,
findOptions: MongoOptions,
storageLevel: StorageLevel ) extends Receiver[D](storageLevel) {
val logger = LoggerFactory.getLogger(getClass)
private var subscription: Option[Subscription] = None
override def onStart(): Unit = {
new Thread() {
override def run() {
logger.info("starting")
receive()
}
}.start()
}
def receive(): Unit = {
mongoConnector.getCollection[D]() match {
case Success(collection) => {
getPickObservable(collection, findOptions).snapshot(true).subscribe(
new Observer[D] {
override def onSubscribe(sub: Subscription): Unit = {
subscription = Some(sub)
sub.request(Long.MaxValue)
}
override def onNext(doc: D): Unit = store(doc)
override def onError(throwable: Throwable): Unit = stop("Observable errored", throwable)
override def onComplete(): Unit = stop("publisher finished")
}
)
}
case Failure(ex) => stop("Failed to connect to MongoDB", ex)
}
}
override def onStop(): Unit = {
logger.info("stopping")
}
}
这有效,但是我让作业多次读取相同的文档,在日志之后,接收器会连续启动和停止,因此它一次又一次地重复相同的处理。以下是我得到的日志:
20/01/07 15:06:21 INFO MongoDBReceiver:从20/01/07 15:06:21开始 INFO群集:使用设置创建的群集 {hosts = [sitewhere-mongodb-rd.gfxiq.prv:27017],mode = SINGLE, requiredClusterType = UNKNOWN,serverSelectionTimeout ='30000 ms', maxWaitQueueSize = 500} 2007年1月20日15:06:21信息群集:未选择服务器 由com.mongodb.async.client.ClientSessionHelper$1@72859317来自 群集描述ClusterDescription {type = UNKNOWN, connectionMode = SINGLE, serverDescriptions = [ServerDescription {address = sitewhere-mongodb-rd.gfxiq.prv:27017, 类型=未知,状态=连接}]}。等待30000毫秒,然后再计时 out 20/01/07 15:06:21 INFO连接:打开的连接 [connectionId {localValue:65,serverValue:14408}]至 sitewhere-mongodb-rd.gfxiq.prv:27017 20/01/07 15:06:21 INFO群集: 监视线程成功连接到服务器的说明 ServerDescription {地址= sitewhere-mongodb-rd.gfxiq.prv:27017, 类型= STANDALONE,状态= CONNECTED,确定= true, 版本= ServerVersion {versionList = [3,4,9]},minWireVersion = 0, maxWireVersion = 5,maxDocumentSize = 16777216, logicalSessionTimeoutMinutes = null,roundTripTimeNanos = 12612869} 20/01/07 15:06:21 INFO连接:打开的连接 [connectionId {localValue:66,serverValue:14409}]至 sitewhere-mongodb-rd.gfxiq.prv:27017 20/01/07 15:06:21 INFO ReceiverSupervisorImpl:使用消息停止发布者:发布者 已完成:07/01/07 15:06:21 INFO MongoDBReceiver:正在停止20/01/07 15:06:21 INFO ReceiverSupervisorImpl:onStop称为接收器20/01/07 15:06:21 INFO ReceiverSupervisorImpl:注销接收器0 20/01/07 15:06:21错误ReceiverTracker:取消注册的接收器 流0:发布者完成了20/01/07 15:06:21 INFO ReceiverSupervisorImpl:停止接收器0
您知道如何解决此问题,以使连接器读取一次并在与mongoDB服务器的连接建立时保持连接状态。