如何在Scala中将文件拆分为连续的非空行块?

时间:2016-09-20 14:50:26

标签: scala parsing

E.g。我们有一个包含以下内容的文件:

.background {
    #gradient1.directional1(rgba(255,255,255,0.3); rgba(0,0,0,0.3); 45deg);
    background-size: cover;
    overflow-x: hidden;
}

任务是将此文件解析为有序/编号的集合(地图,数组或其他),它将包含三个连续块作为字符串集合。

这样做一种算法Java风格的方式似乎相当明显,但如果有人可以建议一个功能性的Scala-idiomatic解决方案,那就太好了。

6 个答案:

答案 0 :(得分:3)

使用Stream.span:

scala> def chunks(s: Stream[String]): Stream[Seq[String]] = {
     | val (h, t) = s.span(_.nonEmpty)
     | h.toSeq #:: chunks(t.tail) }
chunks: (s: Stream[String])Stream[Seq[String]]

有一些技巧:

scala> def chunks(s: Stream[String]): Stream[Stream[String]] = {
     | val (h, t) = s.span(_.nonEmpty)
     | if (h.isEmpty) Stream.empty else h #:: chunks(t drop 1) }
chunks: (s: Stream[String])Stream[Stream[String]]

scala> val cs = chunks(lines.lines.toStream).iterator
cs: Iterator[Stream[String]] = non-empty iterator

scala> cs.next.toList
res0: List[String] = List(aaa, "  bbb", ccc)

scala> cs.next.toList
res1: List[String] = List(dd dd, eee, fff)

scala> cs.next.toList
res2: List[String] = List(gg, hhhhh)

scala> cs.hasNext
res3: Boolean = false

答案 1 :(得分:0)

按任意分隔符(|)拆分并分组到单独的块:

val blocks: List[List[String]] = Source
  .fromFile("<path-to-file>").getLines()
  .mkString("|")
  .split("\\|{2,}").toList
  .map(_.split("\\|").toList)

这会给你一个

List(List(aaa,   bbb, ccc), List(dd dd, eee, fff), List(gg, hhhhh))

答案 2 :(得分:0)

使用可以使用拆分来转换它:

def contigSplit(s : String) : Array[Array[String]] = s.split("\n\n").map(_.split("\n"))

这是有效的,因为一个连续的块以两个换行符结束。

REPL用法:

scala> val s = """
     | aaa
     |   bbb
     | ccc
     | 
     | dd dd
     | eee
     | fff
     | 
     | gg
     | hhhhh
     | """

scala> s.split("\n\n").map(_.split("\n"))
res7: Array[Array[String]] = Array(Array("", aaa, "  bbb", ccc), Array(dd dd, eee, fff), Array(gg, hhhhh))

<强>替代:

如果空白行可以包含其他空格,则可以使用正则表达式拆分:

def contigSplitRegEx(s : String) : Array[Array[String]] = "\n\\s*\n".r.split(s).map(_.split("\n"))

答案 3 :(得分:0)

def getListOfContguousLists(iterator: Iterator[String]): List[List[String]] = {
  val (listOfContiguousList, lastList) = iterator
    .foldLeft((List.empty[List[String]], List.empty[String]))({
      case ((listOfLists, list), line) => (line.isEmpty, list.isEmpty) match {
        case (true, true) => (listOfLists, list)
        case (true, false) => (listOfLists :+ list, List.empty[String])
        case (false, _) => (listOfLists, list :+ line)
      }
    })
  lastList.isEmpty match {
    case true => listOfContiguousList
    case false => listOfContiguousList :+ lastList
  }
}

val list = getListOfContiguousLists(scala.io.Source.fromFile("").getLines)

答案 4 :(得分:0)

维护编号桶的地图并使用行号%3来决定使用桶来放置行

<?php

namespace App\Providers;

use Illuminate\Contracts\Events\Dispatcher as DispatcherContract;
use Illuminate\Foundation\Support\Providers\EventServiceProvider as ServiceProvider;
use App\Events\RelationUpdated;
use App\Product;

class EventServiceProvider extends ServiceProvider
{
    /**
     * Register any other events for your application.
     *
     * @param  \Illuminate\Contracts\Events\Dispatcher  $events
     * @return void
     */
    public function boot(DispatcherContract $events)
    {
        parent::boot($events);

        ...

        //When a listened relation is updated, we perform a Model save
        $events->listen(RelationUpdated::class, function ($event) {
            //Here I do my stuff
        });
    }
}

答案 5 :(得分:0)

它不是纯粹的功能,但作为迭代器非常简洁

def groupBlanksIterator(xs:Iterator[String]) =
 new Iterator[List[String]]
  { def hasNext = xs.hasNext; def next = xs.takeWhile(_.nonEmpty).toList}

groupBlanksIterator(scala.io.Source.fromFile("whatever").getLines)