连接两个DStream以具有单独的mapWithState状态

时间:2016-05-26 15:52:49

标签: apache-spark spark-streaming

我们正在使用mapWithState来跟踪传入的预订事件。一旦完成了一组Booking事件,我想将一个新的Order事件发布到另一个Spark Stream,它将拥有自己的密钥和自己的状态。预订将在帐户级别,订单将在客户级别。我可以像这样创建两个自定义流:

val bookingsReceiver = new MyJMSReceiver()
val bookingsReceiverStream = ssc.receiverStream(bookingsReceiver)
val ordersReceiver = new MyCustomInMemoryReceiver()
val ordersReceiverStream = ssc.receiverStream(ordersReceiver)

因此mapWithState State Specs的定义如下所示

def bookingsStateFunc(batchTime: Time, key: AccountID, value: Option[Booking], state: State[(AccountID, BookingRequest)]): Option[(AccountID, BookingRequest)]

def ordersStateFunc(batchTime: Time, key: ClientID, value: Option[Order], state: State[(ClientID, OrderHistory)]): Option[(ClientID, OrderHistory)]

我可以在预订完成后创建新订单,如下所示:

ordersStream.foreachRDD { rdd =>
  rdd.filter(_._2.bookingStatus == BookingStatus.Completed).foreach { bookingRequest =>
    val order = Order(bookingRequest)
    // What can I do to send the new Order to the Orders stream?
    ordersReceiver.storeOrder(order) //This will not work as ordersReceiver is not available on the Workers ...
  }
}

我可以创建一个外部消息队列来发布新订单。但有没有办法将新数据链接到现有流?或者有没有办法将新数据收集到驱动程序?

此示例自定义接收器不起作用,但显示我的想法...

class MyCustomInMemoryReceiver extends Receiver[Order](StorageLevel.MEMORY_AND_DISK_2) {
  def onStart() = {}
  def onStop() = {}
  def storeOrder(order: Order) = store(order)
}

0 个答案:

没有答案