重写字符串修改更实用

时间:2013-05-07 21:01:37

标签: scala

我正在读取文件中的行

for (line <- Source.fromFile("test.txt").getLines) {
  ....
}

我基本上想要最终得到一个段落列表。如果一行为空,则以新段落开头,我可能希望将来解析一些关键字 - 值对。

文本文件包含这样的条目列表(或类似的内容,如Ini文件)

User=Hans
Project=Blow up the moon
The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs. 

User=....

我基本上想要一个List [Project],其中Project看起来像

class Project (val User: String, val Name:String, val Desc: String) {}

描述是大部分文本不以<keyword>=开头,但可以延伸到任意数量的行。

我知道如何以迭代的方式做到这一点。只需对关键字进行检查列表,然后填充类的实例,并将其添加到列表中以便稍后返回。

但我认为应该可以以适当的功能样式(可能使用match case, yield和递归)执行此操作,从而生成包含字段UserProject和等等。使用的类是已知的,所有关键字都是已知的,并且文件格式也不是一成不变的。我主要是想学习更好的功能风格。

5 个答案:

答案 0 :(得分:8)

你显然正在解析一些东西,所以可能是时候使用......解析器了!

由于您的语言似乎将换行视为重要,因此您需要引用this question来告诉解析器。

除此之外,一个相当简单的实现将是

import scala.util.parsing.combinator.RegexParsers

case class Project(user: String, name: String, description: String)

object ProjectParser extends RegexParsers {
  override val whiteSpace = """[ \t]+""".r

  def eol : Parser[String] = """\r?\n""".r

  def user: Parser[String] = "User=" ~> """[^\n]*""".r <~ eol
  def name: Parser[String] = "Project=" ~> """[^\n]*""".r <~ eol
  def description: Parser[String] = repsep("""[^\n]+""".r, eol) ^^ { case l => l.mkString("\n") }
  def project: Parser[Project] = user ~ name ~ description ^^ { case a ~ b ~ c => Project(a, b, c) }
  def projects: Parser[List[Project]] = repsep(project,eol ~ eol)
}

以及如何使用它:

val sample = """User=foo1
Project=bar1
desc1
desc2
desc3

User=foo
Project=bar
desc4 desc5 desc6
desc7 desc8 desc9"""

import scala.util.parsing.input._
val reader = new CharSequenceReader(sample)
val res = ProjectParser.parseAll(ProjectParser.projects, reader)
if(res.successful) {
    print("Found projects: " + res.get)
} else {
    print(res)
}

答案 1 :(得分:1)

另一种可能的实现(因为这个解析器相当简单),使用递归:

import scala.io.Source
case class Project(user: String, name: String, desc: String)
@scala.annotation.tailrec
def parse(source: Iterator[String], list: List[Project] = Nil): List[Project] = {
  val emptyProject = Project("", "", "")
  @scala.annotation.tailrec
  def parseProject(project: Option[Project] = None): Option[Project] = {
    if(source.hasNext) {
      val line = source.next
      if(!line.isEmpty) {
        val splitted = line.span(_ != '=')
        parseProject(splitted match {
          case (h, t) if h == "User" => project.orElse(Some(emptyProject)).map(_.copy(user = t.drop(1)))
          case (h, t) if h == "Project" => project.orElse(Some(emptyProject)).map(_.copy(name = t.drop(1)))
          case _ => project.orElse(Some(emptyProject)).map(project => project.copy(desc = (if(project.desc.isEmpty) "" else project.desc ++ "\n") ++ line))
        })
      } else project
    } else project
  }

  if(source.hasNext) {
    parse(source, parseProject().map(_ :: list).getOrElse(list))
  } else list.reverse
}

测试:

object Test {
  def source = Source.fromString("""User=Hans
Project=Blow up the moon
The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.

User=Plop
Project=SO
Some desc""")

  def test = println(parse(source.getLines))
}

给出了:

List(Project(Hans,Blow up the moon,The slugs are going to eat the mustard. // multiline possible!
They are sneaky bastards, those slugs.), Project(Plop,SO,Some desc))

答案 2 :(得分:1)

要在不解决关键字解析的情况下回答您的问题,请折叠线和聚合线,除非它是空的,在这种情况下,您将开始一个新的空段落。

lines.foldLeft(List("")) { (l, x) => 
    if (x.isEmpty) "" :: l else (l.head + "\n" + x) :: l.tail  
} reverse

你会注意到它在如何处理零线以及多个和尾随空行方面有一些皱纹。适应您的需求。此外,如果你是关于字符串连接的肛门,你可以在嵌套列表中收集它们并在最后变平(使用.map(_。mkString)),这只是为了展示将序列折叠到标量而不是标量的基本技术一个新的序列。

这会以相反的顺序构建一个列表,因为list prepend(::)比在每一步中附加到l更有效。

答案 3 :(得分:1)

你显然正在建造一些东西,所以你可能想尝试......一个建造者!

和Jürgen一样,我的第一个想法就是折叠,你在积累结果。

mutable.Builder使用collection.generic.CanBuildFrom进行可变累积,以指示用于从源集合创建目标集合的构建器。你将可变的东西保持足够长的时间以获得结果。这就是我对本地化可变性的插件。以免假设从List [String]到List [Project]的路径是不可变的。

对于其他精细答案(具有非负评价等级的答案),我想补充一点,功能风格意味着功能分解,通常是小功能。

如果您没有使用正则表达式解析器,请不要忽略模式匹配中的正则表达式。

尽量不遗余力。事实上,我相信明天是一个圆点日,建议对点敏感的人留在室内。

case class Project(user: String, name: String, description: String)

trait Sample {
  val sample = """
  |User=Hans
  |Project=Blow up the moon
  |The slugs are going to eat the mustard. // multiline possible!
  |They are sneaky bastards, those slugs. 
  |
  |User=Bob
  |I haven't thought up a project name yet.
  |
  |User=Greta
  |Project=Burn the witch
  |It's necessary to escape from the witch before
  |we blow up the moon.  I hope Hans sees it my way.
  |Once we burn the bitch, I mean witch, we can
  |wreak whatever havoc pleases us.
  |""".stripMargin
}

object Test extends App with Sample {
  val kv = "(.*?)=(.*)".r
  def nonnully(s: String) = if (s == null) "" else s + " "
  val empty = Project(null, null, null)
  val (res, dummy) = ((List.empty[Project], empty) /: sample.lines) { (acc, line) =>
    val (sofar, cur) = acc
    line match {
      case kv("User", u)    => (sofar, cur copy (user = u))
      case kv("Project", n) => (sofar, cur copy (name = n))
      case kv(k, _)         => sys error s"Bad keyword $k"
      case x if x.nonEmpty  => (sofar, cur copy (description = s"${nonnully(cur.description)}$x"))
      case _ if cur != empty => (cur :: sofar, empty)
      case _                => (sofar, empty)
    }
  }
  val ps = if (dummy == empty) res.reverse else (dummy :: res).reverse
  Console println ps
}

比赛也可以这样捣碎:

  val (res, dummy) = ((List.empty[Project], empty) /: sample.lines) {
    case ((sofar, cur), kv("User", u))     => (sofar, cur copy (user = u))
    case ((sofar, cur), kv("Project", n))  => (sofar, cur copy (name = n))
    case ((sofar, cur), kv(k, _))          => sys error s"Bad keyword $k"
    case ((sofar, cur), x) if x.nonEmpty   => (sofar, cur copy (description = s"${nonnully(cur.description)}$x"))
    case ((sofar, cur), _) if cur != empty => (cur :: sofar, empty)
    case ((sofar, cur), _)                 => (sofar, empty)
  }

在折叠之前,首先做段落似乎更简单。这是必要的思考吗?

object Test0 extends App with Sample {
  def grafs(ss: Iterator[String]): List[List[String]] = {
    val (g, rest) = ss dropWhile (_.isEmpty) span (_.nonEmpty)
    val others = if (rest.nonEmpty) grafs(rest) else Nil
    g.toList :: others
  }
  def toProject(ss: List[String]): Project = {
    var p = Project("", "", "")
    for (line <- ss; parts = line split '=') parts match {
      case Array("User", u)    => p = p.copy(user = u)
      case Array("Project", n) => p = p.copy(name = n)
      case Array(k, _)         => sys error s"Bad keyword $k"
      case Array(text)         => p = p.copy(description = s"${p.description} $text")
    }
    p
  }
  val ps = grafs(sample.lines) map toProject
  Console println ps
}

答案 4 :(得分:-1)

class Project (val User: String, val Name:String, val Desc: String) {}
object Project {
  def apply(str: String): Project = {
    val user = somehowFetchUserName(str)
    val name = somehowFetchProjectName(str)
    val desc = somehowFetchDescription(str)
    new Project(user, name, desc)
  }
}

val contents: Array[String] = Source.fromFile("test.txt").mkString.split("\\n\\n")
val list = contents map(Project(_))

将以项目列表结束。