Apache Spark接收器调度

时间:2016-07-21 09:07:10

标签: scala serialization apache-spark

我已经实现了一个接收器,该接收器应该连接到WebSocket流并获取要处理的消息。这是我到目前为止所做的实现:

class WebSocketReader (wsConfig: WebSocketConfig, stringMessageHandler: String => Option[String],
  storageLevel: StorageLevel) extends Receiver[String] (storageLevel) {

  // TODO: avoid using a var
  private var wsClient: WebSocketClient = _

  def sendRequest(isRequest: Boolean, msgCount: Int) = {
    while (isRequest) {
      wsClient.send(msgCount.toString)
      Thread.sleep(1000)
    }
  }

  // TODO: avoid using Synchronization...
  private def connect(): Unit = {
    Try {
      wsClient = createWsClient
    } match {
      case Success(_) =>
        wsClient.connect().map {
          case result if result.isSuccess =>
            sendRequest(true, 10)
          case _ =>
            connect()
        }
      case Failure(ex) =>
        // TODO: how to signal a failure so that it is tried the next time....
        ex.printStackTrace()
    }
  }

  def onStart(): Unit = {
    new Thread(getClass.getSimpleName) {
      override def run() { connect() }
    }.start()
  }

  override def onStop(): Unit =
    if (wsClient != null) wsClient.disconnect()

  private def createWsClient = {
    new DefaultHookupClient(new HookupClientConfig(new URI(wsConfig.wsUrl))) {
      override def receive: Receive = {
        case Disconnected(_) =>
          // TODO: use Logging framework, try reconnecting....
          println(s"the web socket is disconnected")
        case TextMessage(message) =>
          stringMessageHandler(message).foreach(store)
        case JsonMessage(jsValue) =>
          stringMessageHandler(jsValue.toString).foreach(store)
      }
    }
  }
}

这个接收器是如何运行的?此Receiver是在工作节点上还是在驱动程序节点上运行?这种方式是一种正确的方法吗?

我想这样做的原因是,暴露WebSocket端点的服务器需要计算我想要接收的消息。假如我向服务器询问100条消息,它会给我100条消息,依此类推。所以我需要一种方法来定期将此请求安排到服务器。目前,我正在使用Thread.sleep机制。这是可取的吗?有什么可以替代?

0 个答案:

没有答案