Question

我试图使用字符串的迭代器，并将其转换为基于任意拆分函数的字符串集合的迭代器。

所以说我有

val splitter: String => Boolean = s => s.isEmpty

然后我希望它采取

val data = List("abc", "def", "", "ghi", "jkl", "mno", "", "pqr").iterator

并且

def f[A] (input: Iterator[A], splitFcn: A => Boolean): Iterator[X[A]]

其中X可以是您想要的任何类似集合的类，只要它可以转换为Seq，这样

f(data, splitter).foreach(println(_.toList))

输出

    List("abc", "def")
    List("ghi", "jkl", "mno")
    List("pqr")

有没有一种干净的方法可以做到这一点，不需要将输入迭代器的结果完全收集到内存中？

Answer 1

这应该做你想要的：

  val splitter: String => Boolean = s => s.isEmpty
  val data = List("abc", "def", "", "ghi", "jkl", "", "mno", "pqr")

  def splitList[A](l: List[A], p: A => Boolean):List[List[A]] = {
    l match {
      case Nil => Nil
      case _ =>
        val (h, t) = l.span(a => !p(a))
        h :: splitList(t.drop(1), p)
    }
  }

  println(splitList(data, splitter))
//prints List(List(abc, def), List(ghi, jkl), List(mno, pqr))

Answer 2

更新＃2：Travis Brown answered another question using Scalaz-streams，这是一个有趣的方案，可能对您有所帮助。我刚刚开始查看软件包，但很快就可以使用它从包含此文件的文件中读取数据：

abc
def

ghi
jkl
mno

pqr

并生成另一个看起来像这样的文件：

Vector(abc, def, )
Vector(ghi, jkl, mno, )
Vector(pqr)

该库仅保存在内存中累积的Vector。这是我的代码（应该被认为是危险的，因为我对Scalaz流几乎一无所知）：

import scalaz.stream._
io.linesR("/tmp/a")
  .pipe( process1.chunkBy(_.nonEmpty) )
  .map( _.toString + "\n" )
  .pipe(text.utf8Encode)
  .to( io.fileChunkW("/tmp/b") )
  .run.run

您的任务的关键是chunkBy(_.nonEmpty)，它会将线条累积到Vector中，直到它击中空行。我不知道为什么你要说两次跑。

下面的旧东西。

更新＃1：啊！我刚刚发现了一个新的约束，它不能全部被读入内存。这个解决方案不适合你;你想要Iterators或Streams。

我猜你想要丰富Traversable。并且通过单独的参数列表中的函数，编译器可以推断出类型。对于性能，您可能只希望对数据进行一次传递。并且为了避免崩溃大数据集（以及性能），您不希望任何非尾递归的递归。鉴于这个更丰富：

implicit class EnrichedTraversable[A]( val xs:Traversable[A] ) extends AnyVal {
  def splitWhere( f: A => Boolean ) = {
    @tailrec
    def loop( xs:Traversable[A], group:Seq[A], groups:Seq[Seq[A]] ):Seq[Seq[A]] =
      if ( xs.isEmpty ) {
        groups :+ group
      } else {
        val x    = xs.head
        val rest = xs.tail
        if ( f(x) ) loop( rest, Vector(), groups :+ group )
        else        loop( rest, group :+ x, groups )
      }
    loop( xs, Vector(), Vector() )
  }
}

你可以这样做：

List("a","b","","c","d") splitWhere (_.isEmpty)

以下是您可能想要检查的一些测试，以确保语义是您想要的（我个人喜欢拆分以这种方式运行）：

val xs = List("a","b","","d","e","","f","g")    //> xs  : List[String] = List(a, b, "", d, e, "", f, g)
xs               splitWhere (_.isEmpty)         //> res0: Seq[Seq[String]] = Vector(Vector(a, b), Vector(d, e), Vector(f, g))
List("a","b","") splitWhere (_.isEmpty)         //> res1: Seq[Seq[String]] = Vector(Vector(a, b), Vector())
List("")         splitWhere (_.isEmpty)         //> res2: Seq[Seq[String]] = Vector(Vector(), Vector())
List[String]()   splitWhere (_.isEmpty)         //> res3: Seq[Seq[String]] = Vector(Vector())
Vector("a","b","","c") splitWhere (_.isEmpty)   //> res4: Seq[Seq[String]] = Vector(Vector(a, b), Vector(c))

Answer 3

我认为Stream是您想要的，因为它们被懒惰地评估（不是内存中的所有内容）。

def split[A](inputStream: Stream[A], splitter: A => Boolean): Stream[List[A]] = {
    var accumulationList: List[A] = Nil
    def loop(inputStream: Stream[A]): Stream[List[A]] = {
      if (inputStream.isEmpty) {
        if (accumulationList.isEmpty)
          Stream.empty[List[A]]
        else
          accumulationList.reverse #:: Stream.empty[List[A]]
      } else if (splitter(inputStream.head)) {
        val outputList = accumulationList.reverse
        accumulationList = Nil
        if (outputList.isEmpty)
          loop(inputStream.tail)
        else
          outputList #:: loop(inputStream.tail)
      } else {
        accumulationList = inputStream.head :: accumulationList
        loop(inputStream.tail)
      }
    }
    loop(inputStream)
  }

  val splitter = { s: String => s.isEmpty }
  val list = List("asdf", "aa", "", "fw", "", "wfwf", "", "")
  val stream = split(list.toStream, splitter)
  stream foreach println

输出结果为：

List(asdf, aa)
List(fw)
List(wfwf)

编辑：
我没有仔细研究过，但我想我的递归方法loop可以用foldLeft或foldRight代替。

Answer 4

这是：

scala> val data = List("abc", "def", "", "ghi", "jkl", "mno", "", "pqr").iterator
data: Iterator[String] = non-empty iterator

scala> val splitter: String => Boolean = s => s.isEmpty
splitter: String => Boolean = <function1>


scala> def f[A](in: Iterator[A], sf: A => Boolean): Iterator[Iterator[A]] = 
         in.hasNext  match {
     |   case false => Iterator()
     |   case true  => Iterator(in.takeWhile(x => !sf(x))) ++ f(in, sf)
     | }
f: [A](in: Iterator[A], sf: A => Boolean)Iterator[Iterator[A]]

scala> f(data, splitter) foreach (x => println(x.toList))
List(abc, def)
List(ghi, jkl, mno)
List(pqr)

在scala中反转flatMap

4 个答案: