我们正在使用mapWithState来跟踪传入的预订事件。一旦完成了一组Booking事件,我想将一个新的Order事件发布到另一个Spark Stream,它将拥有自己的密钥和自己的状态。预订将在帐户级别,订单将在客户级别。我可以像这样创建两个自定义流:
val bookingsReceiver = new MyJMSReceiver()
val bookingsReceiverStream = ssc.receiverStream(bookingsReceiver)
val ordersReceiver = new MyCustomInMemoryReceiver()
val ordersReceiverStream = ssc.receiverStream(ordersReceiver)
因此mapWithState State Specs的定义如下所示
def bookingsStateFunc(batchTime: Time, key: AccountID, value: Option[Booking], state: State[(AccountID, BookingRequest)]): Option[(AccountID, BookingRequest)]
def ordersStateFunc(batchTime: Time, key: ClientID, value: Option[Order], state: State[(ClientID, OrderHistory)]): Option[(ClientID, OrderHistory)]
我可以在预订完成后创建新订单,如下所示:
ordersStream.foreachRDD { rdd =>
rdd.filter(_._2.bookingStatus == BookingStatus.Completed).foreach { bookingRequest =>
val order = Order(bookingRequest)
// What can I do to send the new Order to the Orders stream?
ordersReceiver.storeOrder(order) //This will not work as ordersReceiver is not available on the Workers ...
}
}
我可以创建一个外部消息队列来发布新订单。但有没有办法将新数据链接到现有流?或者有没有办法将新数据收集到驱动程序?
此示例自定义接收器不起作用,但显示我的想法...
class MyCustomInMemoryReceiver extends Receiver[Order](StorageLevel.MEMORY_AND_DISK_2) {
def onStart() = {}
def onStop() = {}
def storeOrder(order: Order) = store(order)
}