使用scala解析日志文件

时间:2018-01-28 22:05:09

标签: scala

我正在尝试解析文本文件。我的输入文件如下所示:

ID:   12343-7888
Name:  Mary, Bob, Jason, Jeff, Suzy
           Harry, Steve
           Larry, George
City:   New York, Portland, Dallas, Kansas City
        Tampa, Bend   

预期输出将:

“12343-7888”
“Mary, Bob, Jason, Jeff, Suzy, Harry, Steve, Larry, George”
“New York, Portland, Dallas, Kansas City, Tampa, Bend"   

请注意,“名称”和“城市”字段中包含新行或返回。我在下面有这个代码,但它不起作用。第二行代码将每个字符放在一行中。另外,我只是在从字段中获取数据时遇到麻烦,比如仅返回实际名称,其中“名称:”不是结果的一部分。此外,希望在每个返回字段周围加上引号。

你能帮忙解决我的问题吗?

val lines = Source.fromFile("/filesdata/logfile.text").getLines().toList
val record = lines.dropWhile(line => !line.startsWith("Name: ")).takeWhile(line  => !line.startsWith("Address: ")).flatMap(_.split(",")).map(_.trim()).filter(_.nonEmpty).mkString(", ")
val final results record.map(s => "\"" + s + "\"").mkString(",\n")

如何获得我想要的结果?

1 个答案:

答案 0 :(得分:1)

简短回答

一个双线程,它产生一个看起来完全符合你指定的字符串:

println(lines.map{line => if(line.trim.matches("[a-zA-Z]+:.*")) 
  ("\"\n\"" + line.split(":")(1).trim) else (", " + line.trim)}.mkString.drop(2) + "\"")

LONG ANSWER

为什么尝试在一行中解决问题,如果你能在94年实现同样的目标?

(这与使用Scala集合时通常的口号完全相反,但是输入非常混乱,我发现实际写出一些中间步骤是值得的。也许这只是因为我买了最近一个不错的新键盘...)

val input = """ID:   12343-7888
Name:  Mary, Bob, Jason, Jeff, Suzy
           Harry, Steve
           Larry, George
City:   New York, Portland, Dallas, Kansas City
        Tampa, Bend
ID: 567865-676
Name: Alex, Bob 
  Chris, Dave 
     Evan, Frank
   Gary
City: Los Angeles, St. Petersburg
   Washington D.C., Phoenix
"""

case class Entry(id: String, names: List[String], cities: List[String])

def parseMessyInput(input: String): List[Entry] = {

  // just a first rought approximation of the structure of the input
  sealed trait MessyInputLine { def content: String }
  case class IdLine(content: String) extends MessyInputLine
  case class NameLine(content: String) extends MessyInputLine
  case class UnlabeledLine(content: String) extends MessyInputLine
  case class CityLine(content: String) extends MessyInputLine

  val lines = input.split("\n").toList

  // a helper function for checking whether a line starts with a label
  def tryParseLabeledLine
    (label: String, line: String)
    (cons: String => MessyInputLine)
  : Option[MessyInputLine] = {
    if (line.startsWith(label + ":")) {
      Some(cons(line.drop(label.size + 1)))
    } else {
      None
    }
  }

  val messyLines: List[MessyInputLine] = for (line <- lines) yield {
    (
      tryParseLabeledLine("Name", line){NameLine(_)} orElse
      tryParseLabeledLine("City", line){CityLine(_)} orElse
      tryParseLabeledLine("ID", line){IdLine(_)}
    ).getOrElse(UnlabeledLine(line))
  }

  /** Combines the content of the first line with the content
    * of all unlabeled lines, until the next labeled line or
    * the end of the list is hit. Returns the content of 
    * the first few lines and the list of the remaining lines.
    */
  def readUntilNextLabel(messyLines: List[MessyInputLine])
  : (List[String], List[MessyInputLine]) = {
    messyLines match {
      case Nil => (Nil, Nil)
      case h :: t => {
        val (unlabeled, rest) = t.span {
          case UnlabeledLine(_) => true
          case _ => false
        }
        (h.content :: unlabeled.map(_.content), rest)
      }
    }
  }

  /** Glues multiple lines to entries */
  def combineToEntries(messyLines: List[MessyInputLine]): List[Entry] = {
    if (messyLines.isEmpty) Nil
    else {
      val (idContent, namesCitiesRest) = readUntilNextLabel(messyLines)
      val (namesContent, citiesRest) = readUntilNextLabel(namesCitiesRest)
      val (citiesContent, rest) = readUntilNextLabel(citiesRest)
      val id = idContent.head.trim
      val names = namesContent.map(_.split(",").map(_.trim).toList).flatten
      val cities = citiesContent.map(_.split(",").map(_.trim).toList).flatten
      Entry(id, names, cities) :: combineToEntries(rest)
    }
  }

  // invoke recursive function on the entire input
  combineToEntries(messyLines)
}

// how to use
val entries = parseMessyInput(input)

// output
for (Entry(id, names, cities) <- entries) {
  println(id)
  println(names.mkString(", "))
  println(cities.mkString(", "))
}

输出:

12343-7888
Mary, Bob, Jason, Jeff, Suzy, Harry, Steve, Larry, George
New York, Portland, Dallas, Kansas City, Tampa, Bend
567865-676
Alex, Bob, Chris, Dave, Evan, Frank, Gary
Los Angeles, St. Petersburg, Washington D.C., Phoenix

你可能 可以将它写在一行中,迟早。但是如果你编写由许多简单的中间步骤组成的哑代码,你就不必那么认真,并且没有足够大的障碍卡住。