我试图使用字符串的迭代器,并将其转换为基于任意拆分函数的字符串集合的迭代器。
所以说我有
val splitter: String => Boolean = s => s.isEmpty
然后我希望它采取
val data = List("abc", "def", "", "ghi", "jkl", "mno", "", "pqr").iterator
并且
def f[A] (input: Iterator[A], splitFcn: A => Boolean): Iterator[X[A]]
其中X
可以是您想要的任何类似集合的类,只要它可以转换为Seq
,这样
f(data, splitter).foreach(println(_.toList))
输出
List("abc", "def")
List("ghi", "jkl", "mno")
List("pqr")
有没有一种干净的方法可以做到这一点,不需要将输入迭代器的结果完全收集到内存中?
答案 0 :(得分:0)
这应该做你想要的:
val splitter: String => Boolean = s => s.isEmpty
val data = List("abc", "def", "", "ghi", "jkl", "", "mno", "pqr")
def splitList[A](l: List[A], p: A => Boolean):List[List[A]] = {
l match {
case Nil => Nil
case _ =>
val (h, t) = l.span(a => !p(a))
h :: splitList(t.drop(1), p)
}
}
println(splitList(data, splitter))
//prints List(List(abc, def), List(ghi, jkl), List(mno, pqr))
答案 1 :(得分:0)
更新#2:Travis Brown answered another question using Scalaz-streams,这是一个有趣的方案,可能对您有所帮助。我刚刚开始查看软件包,但很快就可以使用它从包含此文件的文件中读取数据:
abc
def
ghi
jkl
mno
pqr
并生成另一个看起来像这样的文件:
Vector(abc, def, )
Vector(ghi, jkl, mno, )
Vector(pqr)
该库仅保存在内存中累积的Vector。这是我的代码(应该被认为是危险的,因为我对Scalaz流几乎一无所知):
import scalaz.stream._
io.linesR("/tmp/a")
.pipe( process1.chunkBy(_.nonEmpty) )
.map( _.toString + "\n" )
.pipe(text.utf8Encode)
.to( io.fileChunkW("/tmp/b") )
.run.run
您的任务的关键是chunkBy(_.nonEmpty)
,它会将线条累积到Vector中,直到它击中空行。我不知道为什么你要说两次跑。
下面的旧东西。
更新#1:啊!我刚刚发现了一个新的约束,它不能全部被读入内存。这个解决方案不适合你;你想要Iterators或Streams。
我猜你想要丰富Traversable。并且通过单独的参数列表中的函数,编译器可以推断出类型。对于性能,您可能只希望对数据进行一次传递。并且为了避免崩溃大数据集(以及性能),您不希望任何非尾递归的递归。鉴于这个更丰富:
implicit class EnrichedTraversable[A]( val xs:Traversable[A] ) extends AnyVal {
def splitWhere( f: A => Boolean ) = {
@tailrec
def loop( xs:Traversable[A], group:Seq[A], groups:Seq[Seq[A]] ):Seq[Seq[A]] =
if ( xs.isEmpty ) {
groups :+ group
} else {
val x = xs.head
val rest = xs.tail
if ( f(x) ) loop( rest, Vector(), groups :+ group )
else loop( rest, group :+ x, groups )
}
loop( xs, Vector(), Vector() )
}
}
你可以这样做:
List("a","b","","c","d") splitWhere (_.isEmpty)
以下是您可能想要检查的一些测试,以确保语义是您想要的(我个人喜欢拆分以这种方式运行):
val xs = List("a","b","","d","e","","f","g") //> xs : List[String] = List(a, b, "", d, e, "", f, g)
xs splitWhere (_.isEmpty) //> res0: Seq[Seq[String]] = Vector(Vector(a, b), Vector(d, e), Vector(f, g))
List("a","b","") splitWhere (_.isEmpty) //> res1: Seq[Seq[String]] = Vector(Vector(a, b), Vector())
List("") splitWhere (_.isEmpty) //> res2: Seq[Seq[String]] = Vector(Vector(), Vector())
List[String]() splitWhere (_.isEmpty) //> res3: Seq[Seq[String]] = Vector(Vector())
Vector("a","b","","c") splitWhere (_.isEmpty) //> res4: Seq[Seq[String]] = Vector(Vector(a, b), Vector(c))
答案 2 :(得分:0)
我认为Stream
是您想要的,因为它们被懒惰地评估(不是内存中的所有内容)。
def split[A](inputStream: Stream[A], splitter: A => Boolean): Stream[List[A]] = {
var accumulationList: List[A] = Nil
def loop(inputStream: Stream[A]): Stream[List[A]] = {
if (inputStream.isEmpty) {
if (accumulationList.isEmpty)
Stream.empty[List[A]]
else
accumulationList.reverse #:: Stream.empty[List[A]]
} else if (splitter(inputStream.head)) {
val outputList = accumulationList.reverse
accumulationList = Nil
if (outputList.isEmpty)
loop(inputStream.tail)
else
outputList #:: loop(inputStream.tail)
} else {
accumulationList = inputStream.head :: accumulationList
loop(inputStream.tail)
}
}
loop(inputStream)
}
val splitter = { s: String => s.isEmpty }
val list = List("asdf", "aa", "", "fw", "", "wfwf", "", "")
val stream = split(list.toStream, splitter)
stream foreach println
输出结果为:
List(asdf, aa)
List(fw)
List(wfwf)
编辑:
我没有仔细研究过,但我想我的递归方法loop
可以用foldLeft或foldRight代替。
答案 3 :(得分:0)
这是:
scala> val data = List("abc", "def", "", "ghi", "jkl", "mno", "", "pqr").iterator
data: Iterator[String] = non-empty iterator
scala> val splitter: String => Boolean = s => s.isEmpty
splitter: String => Boolean = <function1>
scala> def f[A](in: Iterator[A], sf: A => Boolean): Iterator[Iterator[A]] =
in.hasNext match {
| case false => Iterator()
| case true => Iterator(in.takeWhile(x => !sf(x))) ++ f(in, sf)
| }
f: [A](in: Iterator[A], sf: A => Boolean)Iterator[Iterator[A]]
scala> f(data, splitter) foreach (x => println(x.toList))
List(abc, def)
List(ghi, jkl, mno)
List(pqr)