应使用Scala模式匹配和正则表达式逐行解析文本文件。如果一行以"names:\t"
开头,则后续以制表符分隔的名称应以Seq[String]
(或类似名称)提供。
这是一个非工作代码示例:
val Names = "^names:(?:\t([a-zA-Z0-9_]+))+$".r
"names:\taaa\tbbb\tccc" match {
case Names(names @ _*) => println(names)
// […] other cases
case _ => println("no match")
}
输出:List(ccc)
通缉输出:List(aaa, bbb, ccc)
以下代码可以按需运行...
object NamesObject {
private val NamesLine = "^names:\t([a-zA-Z0-9_]+(?:\t[a-zA-Z0-9_]+)*)$".r
def unapplySeq(s: String): Option[Seq[String]] = s match {
case NamesLine(nameString) => Some(nameString.split("\t"))
case _ => None
}
}
"names:\taaa\tbbb\tccc" match {
case NamesObject(names @ _*) => println(names)
// […] other cases
case _ => println("no match")
}
输出(按需):WrappedArray(aaa, bbb, ccc)
我想知道:如果不创建object
,这是否可以更简单的方式实现,就像在第一个但不起作用的代码示例中一样?
答案 0 :(得分:1)
使用你的工作正则表达式。(\w
是[a-zA-Z0-9_]
预定义的字符类)
val Names = """names:\t(\w+(?:\t\w+)*)""".r
"names:\taaa\tbbb\tccc" match {
case Names(names) => println(names.split("\t") toSeq)
case _ => println("no match")
}
第一,第二和第二尾部绑定,
val Names = """names:\t(\w+)?\t?(\w+)?\t?((?:\w+?\t?)*)""".r
"names:\taaa\tbbb\tccc\tddd" match {
case Names(first, second, tail) =>
println(first + ", " + second + ", " + (tail.split("\t") toSeq));
case _ => println("no match")
}
答案 1 :(得分:0)
正如Randall Schulz所说,似乎不可能只使用正则表达式。因此,对我的问题的简短回答是 no 。
我目前的解决方案如下:我使用这个类...
class SeparatedLinePattern(Pattern: Regex, separator: String = "\t") {
def unapplySeq(s: String): Option[Seq[String]] = s match {
case Pattern(nameString) => Some(nameString.split(separator).toSeq)
case _ => None
}
}
...创建模式:
val Names = new SeparatedLinePattern("""names:\t([A-Za-z]+(?:\t[A-Za-z]+)*)""".r)
val Ints = new SeparatedLinePattern("""ints:\t(\d+(?:\t\d+)*)""".r)
val ValuesWithID = new SeparatedLinePattern("""id-value:\t(\d+\t\w+(?:\t\d+\t\w+)*)""".r)
以下是一个如何以非常灵活的方式使用它们的示例:
val testStrings = List("names:\taaa", "names:\tbbb\tccc", "names:\tddd\teee\tfff\tggg\thhh",
"ints:\t123", "ints:\t456\t789", "ints:\t100\t200\t300",
"id-value:\t42\tbaz", "id-value:\t2\tfoo\t5\tbar\t23\tbla")
for (s <- testStrings) s match {
case Names(name) => println(s"The name is '$name'")
case Names(a, b) => println(s"The two names are '$a' and '$b'")
case Names(names @ _*) => println("Many names: " + names.mkString(", "))
case Ints(a) => println(s"Just $a")
case Ints(a, b) => println(s"$a + $b == ${a.toInt + b.toInt}")
case Ints(nums @ _*) => println("Sum is " + (nums map (_.toInt)).sum)
case ValuesWithID(id, value) => println(s"ID of '$value' is $id")
case ValuesWithID(values @ _*) => println("As map: " + (values.grouped(2) map (x => x(0).toInt -> x(1))).toMap)
case _ => println("No match")
}
输出:
The name is 'aaa'
The two names are 'bbb' and 'ccc'
Many names: ddd, eee, fff, ggg, hhh
Just 123
456 + 789 == 1245
Sum is 600
ID of 'baz' is 42
As map: Map(2 -> foo, 5 -> bar, 23 -> bla)