我想为不同的akka流实现sql merge join
。
例如,我有3个类:
case class A(id: String, as: String)
case class B(a_id: String, bs: String)
case class C(id: String, as: String, bs: String)
我有两个来源(Source[A]
,Source[B]
都按id
和a_id
排序)我希望Sink[C]
以{{{ 1}}。我无法理解它是如何实施的。
溪流的例子:
id=a_id
包含:A(1," a1"),A(2," a2"),A(3," a3_1" ),A(3," a3_2"),A(4," a4")
Source[A]
包含:B(2," b2),B(3," b3")
Source[B]
必须是:C(2," a2"," b2),C(3," a3_1"," b3&# 34;),C(3," a3_2"," b3")
答案 0 :(得分:0)
以上示例对OneToManyMergeJoin
规范不正确。
正确的是:
case class A(id: Int, as: String)
case class B(a_id: Int, bs: String)
case class C(id: Int, as: String, bs: String)
val source1: Source[A, NotUsed] = Source(
List(A(1, "a1"), A(2, "a2"), A(3, "a3"), A(4, "a4"))
)
val source2: Source[B, NotUsed] = Source(
List(B(2, "b2"), B(3, "b3_1"), B(3, "b3_2"))
)
其中一个来源必须是DISTINCT,并且都按某些属性排序。 所以我们可以使用
class OneToManyMergeJoin[Distinct, Duplicated, O](val zipper: (Distinct, Duplicated) ⇒ O, val comparator: (Distinct, Duplicated) => Int) extends GraphStage[FanInShape2[Distinct, Duplicated , O]] {
override val shape: FanInShape2[Distinct, Duplicated , O] = new FanInShape2("OneToManyMergeJoin")
private val left = shape.in0
private val right = shape.in1
private val out = shape.out
override def createLogic(inheritedAttributes: Attributes) = new GraphStageLogic(shape) with StageLogging {
setHandler(left, ignoreTerminateInput)
setHandler(right, ignoreTerminateInput)
setHandler(out, eagerTerminateOutput)
var leftValue: Distinct = _
var rightValue: Duplicated = _
def dispatch(l: Distinct, r: Duplicated): Unit = {
val c = comparator(leftValue,rightValue)
if (c == 0) {
emit(out, zipper(leftValue,rightValue), readR)
} else {
if (c < 0) readL() else readR()
}
}
private val dispatchR = { v: Duplicated =>
rightValue = v
dispatch(leftValue, rightValue)
}
private val dispatchL = { v: Distinct =>
leftValue = v
dispatch(leftValue, rightValue)
}
lazy val readR: () => Unit = () => read(right)(dispatchR, () => if (comparator(leftValue, rightValue) < 0) readL() else completeStage())
lazy val readL: () => Unit = () => read(left)(dispatchL, () => if (comparator(leftValue, rightValue) > 0) readR() else completeStage())
override def preStart(): Unit = {
// all fan-in stages need to eagerly pull all inputs to get cycles started
pull(right)
read(left)(
l => {
leftValue = l
read(right)(dispatchR, () => completeStage())
},
() => completeStage()
)
}
}
}
作为Merge Join的实施equals predicate
。
它可以与第一个Distinct source
,第二个Duplicated source
,zipper function
和comparator function
一起使用:
def oneToManyMergeJoin[Distinct, Duplicated, O](s1: Source[Distinct, NotUsed], s2: Source[Duplicated, NotUsed])(zipper: (Distinct, Duplicated) => O, comparator: (Distinct, Duplicated) => Int): Source[O, NotUsed] = {
Source.fromGraph( GraphDSL.create() {implicit builder =>
import GraphDSL.Implicits._
val m = builder.add(new OneToManyMergeJoin[Distinct, Duplicated, O]( zipper, comparator))
s1 ~> m.in0
s2 ~> m.in1
SourceShape(m.out)
})
}
oneToManyMergeJoin(source1, source2)( (a, b) => C(a.id, a.as, b.bs), (a, b) => Ordering.Int.compare(a.id, b.a_id))
.runWith( Sink.foreach[C](p => log.debug(s"Sink: $p") ))
结果:
DEBUG Sink: C(2,a2,b2)
DEBUG Sink: C(3,a3,b3_1)
DEBUG Sink: C(3,a3,b3_2)
为什么需要它?其中一个案例是合并cassandra而不是完全非规范化表(但已经排序)。