假设我们列出了某些对象的Seq[A]
和Seq[B]
两个列表,并希望在某个条件(A, B) => Boolean
上加入它们。可能类似于第一个列表中的一个元素,第二个元素中存在多个匹配元素。如果谈到full join
,我们的意思是我们也想知道两个列表中哪些元素没有相应的对。
所以签名将是:
def fullJoin[A, B](left: Seq[A], right: Seq[B], joinCondition: (A, B) => Boolean): (Seq[A], Seq[B], Seq[(A, B)])
或者,如果我们利用Cats'Ior
类型:
def fullJoin[A, B](left: Seq[A], right: Seq[B], joinCondition: (A, B) => Boolean): Seq[Ior[A, B]]
示例:
scala> fullJoin[Int, Int](List(1,2), List(3,4,4), {_ * 2 == _ })
res4: (Seq[Int], Seq[Int], Seq[(Int, Int)]) = (List(1),List(3),List((2,4), (2,4)))
这个想法与在SQL中连接表的想法完全相同。
问题是标准库中是否有任何类似的实用方法。如果没有,让我们讨论一个优雅的解决方案 - 首先,性能不是问题(二次复杂度很好,就像嵌套循环一样)。
答案 0 :(得分:3)
这是一个利用内置scala库功能更简洁的解决方案:
def fullJoin[A, B](left: Seq[A], right: Seq[B], joinCondition: (A, B) => Boolean): (Seq[A], Seq[B], Seq[(A, B)]) = {
val matched = for (a <- left; b <- right if joinCondition(a, b)) yield (a, b)
val matchedLeft = matched.map(_._1).toSet
val matchedRight = matched.map(_._2).toSet
(left.filterNot(matchedLeft.contains), right.filterNot(matchedRight.contains), matched)
}
答案 1 :(得分:0)
我认为完全加入问题可以通过左连接来解决。没有真正优化,但这是我的解决方案:
def fullJoin[A, B](left: Seq[A], right: Seq[B], joinCondition: (A, B) => Boolean): (Seq[A], Seq[B], Seq[(A, B)]) = {
val (notJoinedLeft, joined) = leftJoin(left, right, joinCondition)
val (notJoinedRight, _) = leftJoin(right, left, (b: B, a: A) => joinCondition(a, b))
(notJoinedLeft, notJoinedRight, joined)
}
def leftJoin[A, B](left: Seq[A], right: Seq[B], joinCondition: (A, B) => Boolean): (Seq[A], Seq[(A, B)]) = {
val matchingResult: Seq[Either[A, Seq[(A, B)]]] = for {
a <- left
} yield {
right.filter(joinCondition.curried(a)) match {
case Seq() => Left(a)
case matchedBs: Seq[B] => Right(matchedBs.map((a, _)))
}
}
val (notMatched: Seq[A], matched: Seq[Seq[(A, B)]]) = partition(matchingResult)
(notMatched, matched.flatten)
}
def partition[A, B](list: Seq[Either[A, B]]): (Seq[A], Seq[B]) = {
val (lefts, rights) = list.partition(_.isLeft)
(lefts.map(_.left.get), rights.map(_.right.get))
}