我正在尝试解析文本文件。我的输入文件如下所示:
ID: 12343-7888
Name: Mary, Bob, Jason, Jeff, Suzy
Harry, Steve
Larry, George
City: New York, Portland, Dallas, Kansas City
Tampa, Bend
预期输出将:
“12343-7888”
“Mary, Bob, Jason, Jeff, Suzy, Harry, Steve, Larry, George”
“New York, Portland, Dallas, Kansas City, Tampa, Bend"
请注意,“名称”和“城市”字段中包含新行或返回。我在下面有这个代码,但它不起作用。第二行代码将每个字符放在一行中。另外,我只是在从字段中获取数据时遇到麻烦,比如仅返回实际名称,其中“名称:”不是结果的一部分。此外,希望在每个返回字段周围加上引号。
你能帮忙解决我的问题吗?
val lines = Source.fromFile("/filesdata/logfile.text").getLines().toList
val record = lines.dropWhile(line => !line.startsWith("Name: ")).takeWhile(line => !line.startsWith("Address: ")).flatMap(_.split(",")).map(_.trim()).filter(_.nonEmpty).mkString(", ")
val final results record.map(s => "\"" + s + "\"").mkString(",\n")
如何获得我想要的结果?
答案 0 :(得分:1)
简短回答
一个双线程,它产生一个看起来完全符合你指定的字符串:
println(lines.map{line => if(line.trim.matches("[a-zA-Z]+:.*"))
("\"\n\"" + line.split(":")(1).trim) else (", " + line.trim)}.mkString.drop(2) + "\"")
LONG ANSWER
为什么尝试在一行中解决问题,如果你能在94年实现同样的目标?
(这与使用Scala集合时通常的口号完全相反,但是输入非常混乱,我发现实际写出一些中间步骤是值得的。也许这只是因为我买了最近一个不错的新键盘...)
val input = """ID: 12343-7888
Name: Mary, Bob, Jason, Jeff, Suzy
Harry, Steve
Larry, George
City: New York, Portland, Dallas, Kansas City
Tampa, Bend
ID: 567865-676
Name: Alex, Bob
Chris, Dave
Evan, Frank
Gary
City: Los Angeles, St. Petersburg
Washington D.C., Phoenix
"""
case class Entry(id: String, names: List[String], cities: List[String])
def parseMessyInput(input: String): List[Entry] = {
// just a first rought approximation of the structure of the input
sealed trait MessyInputLine { def content: String }
case class IdLine(content: String) extends MessyInputLine
case class NameLine(content: String) extends MessyInputLine
case class UnlabeledLine(content: String) extends MessyInputLine
case class CityLine(content: String) extends MessyInputLine
val lines = input.split("\n").toList
// a helper function for checking whether a line starts with a label
def tryParseLabeledLine
(label: String, line: String)
(cons: String => MessyInputLine)
: Option[MessyInputLine] = {
if (line.startsWith(label + ":")) {
Some(cons(line.drop(label.size + 1)))
} else {
None
}
}
val messyLines: List[MessyInputLine] = for (line <- lines) yield {
(
tryParseLabeledLine("Name", line){NameLine(_)} orElse
tryParseLabeledLine("City", line){CityLine(_)} orElse
tryParseLabeledLine("ID", line){IdLine(_)}
).getOrElse(UnlabeledLine(line))
}
/** Combines the content of the first line with the content
* of all unlabeled lines, until the next labeled line or
* the end of the list is hit. Returns the content of
* the first few lines and the list of the remaining lines.
*/
def readUntilNextLabel(messyLines: List[MessyInputLine])
: (List[String], List[MessyInputLine]) = {
messyLines match {
case Nil => (Nil, Nil)
case h :: t => {
val (unlabeled, rest) = t.span {
case UnlabeledLine(_) => true
case _ => false
}
(h.content :: unlabeled.map(_.content), rest)
}
}
}
/** Glues multiple lines to entries */
def combineToEntries(messyLines: List[MessyInputLine]): List[Entry] = {
if (messyLines.isEmpty) Nil
else {
val (idContent, namesCitiesRest) = readUntilNextLabel(messyLines)
val (namesContent, citiesRest) = readUntilNextLabel(namesCitiesRest)
val (citiesContent, rest) = readUntilNextLabel(citiesRest)
val id = idContent.head.trim
val names = namesContent.map(_.split(",").map(_.trim).toList).flatten
val cities = citiesContent.map(_.split(",").map(_.trim).toList).flatten
Entry(id, names, cities) :: combineToEntries(rest)
}
}
// invoke recursive function on the entire input
combineToEntries(messyLines)
}
// how to use
val entries = parseMessyInput(input)
// output
for (Entry(id, names, cities) <- entries) {
println(id)
println(names.mkString(", "))
println(cities.mkString(", "))
}
输出:
12343-7888
Mary, Bob, Jason, Jeff, Suzy, Harry, Steve, Larry, George
New York, Portland, Dallas, Kansas City, Tampa, Bend
567865-676
Alex, Bob, Chris, Dave, Evan, Frank, Gary
Los Angeles, St. Petersburg, Washington D.C., Phoenix
你可能 可以将它写在一行中,迟早。但是如果你编写由许多简单的中间步骤组成的哑代码,你就不必那么认真,并且没有足够大的障碍卡住。