我希望聚合序列中的兼容元素,即将Seq[T]
转换为Seq[Seq[T]]
,其中每个子序列中的元素彼此兼容,同时保留原始seq顺序,例如,从
case class X(i: Int, n: Int) {
def canJoin(that: X): Boolean = this.n == that.n
override val toString = i + "." + n
}
val xs = Seq(X(1, 1), X(2, 3), X(3, 3), X(4, 3), X(5, 1), X(6, 2), X(7, 2), X(8, 1))
/* xs = List(1.1, 2.3, 3.3, 4.3, 5.1, 6.2, 7.2, 8.1) */
想要获得
val js = join(xs)
/* js = List(List(1.1), List(2.3, 3.3, 4.3), List(5.1), List(6.2, 7.2), List(8.1)) */
我试图以功能的方式做到这一点,但我中途陷入困境:
def split(seq: Seq[X]): (Seq[X], Seq[X]) = seq.span(_ canJoin seq.head)
def join(seq: Seq[X]): Seq[Seq[X]] = {
var pp = Seq[Seq[X]]()
var s = seq
while (!s.isEmpty) {
val (p, r) = split(s)
pp :+= p
s = r
}
pp
}
split
我很满意,但join
似乎有点太长了。
在我看来,这是一项标准任务。这引出了我的问题:
def join(xs: Seq[X]): Seq[Seq[X]] = {
@annotation.tailrec
def jointr(pp: Seq[Seq[X]], rem: Seq[X]): Seq[Seq[X]] = {
val (p, r) = split(rem)
val pp2 = pp :+ p
if (r.isEmpty) pp2 else jointr(pp2, r)
}
jointr(Seq(), xs)
}
答案 0 :(得分:8)
def join(seq: Seq[X]): Seq[Seq[X]] = {
if (seq.isEmpty) return Seq()
val (p,r) = split(seq)
Seq(p) ++ join(r)
}
答案 1 :(得分:4)
以下是foldLeft
版本:
def join(seq: Seq[X]) = xs.reverse.foldLeft(Nil: List[List[X]]) {
case ((top :: group) :: rest, x) if x canJoin top =>
(x :: top :: group) :: rest
case (list, x) => (x :: Nil) :: list
}
和foldRight
版本(在这种情况下,您不需要reverse
列表):
def join(seq: Seq[X]) = xs.foldRight(Nil: List[List[X]]) {
case (x, (top :: group) :: rest) if x canJoin top =>
(x :: top :: group) :: rest
case (x, list) => (x :: Nil) :: list
}
答案 2 :(得分:3)
因为我有太多的时间;-),我问自己,因为不同的方法的运行时间是为了感觉重型构造是否潜伏在轻量级语法背后。
所以我创建了一个微测量基准来测量三个序列的运行时间
(1, 3, 3, 3, 1, 2, 2, 1)
(1, 2, 3, 4, 5, 6, 7, 8, 8, 8, 8, 8, 7, 6, 5, 4, 3, 3, 3, 2, 1, 2, 3)
(2, 2, 3, 4, 5, 6, 7, 8, 8, 8, 8, 8, 8, 8, 8, 7, 6, 5, 4, 4, 4, 4, 3, 3, 3, 2, 1)
并得到以下结果:
在我的真实项目中纳入结果时,我遇到了基准测试的不一致性。因此,我再次使用更多热身圈(现在为1000)重复基准测试,因此JIT编译器可以充分利用代码。因此,对结果进行了洗牌,并为我带来了新的喜爱: X7(pimp my lib)=快乐无悔。而List
版本X8(reverse.foldLeft)现在也非常快。
Nr (Approach) Running time (ns) Contributor
X2 (poor.reference.impl) in 15.202 ns
X1 (original while loop) in 8.166 ns
X3 (tail recursion) in 7.473 ns
X4 (recursion with ++) in 6.671 ns Peter Schmitz
X5 (simplified recursion with ++) in 6.161 ns Peter Schmitz
X6 (foldRight) in 4.083 ns tenshi
X7 (pimp my lib) in 1.677 ns Rex Kerr
X8 (reverse.foldLeft) in 1.349 ns tenshi
Nr (Approach) Running time (ns) Contributor
X2 (poor.reference.impl) in 2.972.015 ns
X7 (pimp my lib) in 1.185.599 ns Rex Kerr
X3 (tail recursion) in 1.027.008 ns
X8 (reverse.foldLeft) in 643.840 ns tenshi
X6 (foldRight) in 608.112 ns ""
X1 (original while loop) in 564.726 ns
X4 (recursion with ++) in 468.478 ns Peter Schmitz
X5 (simplified recursion with ++) in 447.699 ns ""
// in 15.202 ns
import collection.mutable.ArrayBuffer
def join2(seq: Seq[X]): Seq[Seq[X]] = {
var pp = Seq[ArrayBuffer[X]](ArrayBuffer(seq(0)))
for (i <- 1 until seq.size) {
if (seq(i) canJoin seq(i - 1)) {
pp.last += seq(i)
} else {
pp :+= ArrayBuffer(seq(i))
}
}
pp
}
// in 8.166 ns
def join(xs: Seq[X]): Seq[Seq[X]] = {
var xss = Seq.empty[Seq[X]]
var s = xs
while (!s.isEmpty) {
val (p, r) = split(s)
xss :+= p
s = r
}
xss
}
这是问题开头的原始必要方法。
// in 7.473 ns
def join(xs: Seq[X]): Seq[Seq[X]] = {
@annotation.tailrec
def jointr(xss: Seq[Seq[X]], rxs: Seq[X]): Seq[Seq[X]] = {
val (g, r) = split(rxs)
val xsn = xss :+ g
if (r.isEmpty) xsn else jointr(xsn, r)
}
jointr(Seq(), xs)
}
// in 6.671 ns
def join(seq: Seq[X]): Seq[Seq[X]] = {
if (seq.isEmpty) return Seq()
val (p, r) = split(seq)
Seq(p) ++ join(r)
}
// in 6.161 ns
def join(xs: Seq[X]): Seq[Seq[X]] = if (xs.isEmpty) Seq() else {
val (p, r) = split(xs)
Seq(p) ++ join(r)
}
简化几乎相同,但仍然快一点。
// in 4.083 ns
def join(xs: Seq[X]) = xs.foldRight(Nil: List[List[X]]) {
case (x, (top :: group) :: rest) if x canJoin top => (x :: top :: group) :: rest
case (x, list) => (x :: Nil) :: list
}
试图避免使用reverse
但foldRight
似乎比列表的reverse.foldLeft
更糟糕。
// in 1.677 ns
import collection.generic.CanBuildFrom
class GroupingCollection[A, C, D[C]](ca: C)(
implicit c2i: C => Iterable[A],
cbf: CanBuildFrom[C, C, D[C]],
cbfi: CanBuildFrom[C, A, C]) {
def groupedWhile(p: (A, A) => Boolean): D[C] = {
val it = c2i(ca).iterator
val cca = cbf()
if (!it.hasNext) cca.result
else {
val as = cbfi()
var olda = it.next
as += olda
while (it.hasNext) {
val a = it.next
if (p(olda, a)) as += a
else { cca += as.result; as.clear; as += a }
olda = a
}
cca += as.result
}
cca.result
}
}
implicit def collections_have_grouping[A, C[A]](ca: C[A])(
implicit c2i: C[A] => Iterable[A],
cbf: CanBuildFrom[C[A], C[A], C[C[A]]],
cbfi: CanBuildFrom[C[A], A, C[A]]) = {
new GroupingCollection[A, C[A], C](ca)(c2i, cbf, cbfi)
}
// xs.groupedWhile(_ canJoin _)
// in 1.349 ns
def join(xs: Seq[X]) = xs.reverse.foldLeft(Nil: List[List[X]]) {
case ((top :: group) :: rest, x) if x canJoin top => (x :: top :: group) :: rest
case (list, x) => (x :: Nil) :: list
}
不同的方法(X1,X3,X4,X5,X6)都在同一个联赛中发挥。
因为 X7(pimp my lib)允许非常简洁的使用xs.groupedWhile(_ canJoin _)
并导致必要的代码可以隐藏在自己的util lib中,我决定使用它我的真实项目。