我想解析以下测试数据:它适用于3个案例,所以我认为我的正则表达式存在问题。如果一行以#开头并且注释也以#开头,那么它就会停止工作。有人可以解释原因吗?到目前为止,这是我的解决方案......
val testDate =
"""
|127.0.0.1 ads234.com
|#127.0.0.1 auto.search.msn.com # Microsoft uses this server to redirect
|#127.0.0.1 sitefinder.verisign.com # Verisign has joined the game
|#127.0.0.1 sitefinder-idn.verisign.com # of trying to hijack mistyped
|#127.0.0.1 s0.2mdn.net # This may interfere with some streaming
|#127.0.0.1 ad.doubleclick.net # This may interfere with www.sears.com
|127.0.0.1 media.fastclick.net # Likewise, this may interfere with some
|127.0.0.1 cdn.fastclick.net
""".stripMargin
我想保留#和评论,如果有的话。
object Example extends RegexParsers {
def comment: Parser[String] = """#.*""".r
def url: Parser[String] = """[A-Za-z0-9-\.\_\-]{1,65}(?<!-)\.+[A-Za-z]{2,7}""".r
def localhost: Parser[String] = """\b(\d{1,3}\.){3}\d{1,3}\b""".r
def pound: Parser[String] = "#".r
def port: Parser[String] = """:\d{3}""".r
def urlPort = url | url <~ port
def pos1 = localhost ~ urlPort ^^ {
case _ ~ dns => LineParsed("", dns, "")
}
def pos2 = pound ~ localhost ~ urlPort ^^ {
case p ~ _ ~ dns => LineParsed(p, dns, "")
}
def pos3 = localhost ~ urlPort ~ comment ^^ {
case _ ~ dns ~ com => LineParsed("", dns, com)
}
def pos4 =enter code here pound ~ localhost ~ urlPort ~ comment ^^ {
case p ~ _ ~ dns ~ com => LineParsed(p, dns, com)
}
def linePos = pos1 | pos2 | pos3 | pos4
def fullLine = repsep(linePos, """\W*""".r)
}
得到以下例外:
#127.0.0.1 auto.search.msn.com # Microsoft uses this server to redirect
^
java.lang.RuntimeException: No result when parsing failed
答案 0 :(得分:1)
您的代码中存在一些错误。首先,默认情况下,换行符被视为空格,但您需要&#34;请参阅&#34;他们正确地打破了条目。所以你需要重新定义空格:
object Example extends RegexParsers {
override protected val whiteSpace: Regex = "[ \t]+".r
然后将fullLine
解析器写为:
//allow several empty lines at the beginning and between entries
def fullLine = rep("\n") ~> repsep(linePos, rep1("\n"))
(另一种选择是预先拆分线并单独解析它们)
下一个错误是您将解析器与|
组合在一起的方式。要解析A
,可选地后跟B
,请不要写A | A ~ B
。在阅读B
后,它永远不会尝试阅读A
,因为左侧已经成功。改为写A ~ B.?
def urlPort = url <~ port.? // But anyway, you'll neve have a port in a host file !
同样,4个案例pos1 | pos2 | pos3 | pos4
可以大大简化:
def linePos = pound.? ~ localhost ~ urlPort ~ comment.? ^^ {
case p ~ _ ~ dns ~ com ⇒ LineParsed(p.getOrElse(""), dns,com.getOrElse(""))
}
您可以在此处看到?
组合器如何为Option
和p
提供com
。我使用getOrElse
来适应LineParsed
的结构并保留代码的原始行为,但更多的scala-ish方法是将其保留为LineParsed
中的一个选项类。
以下是解析您的示例的最终工作代码:
object Example extends RegexParsers {
override protected val whiteSpace: Regex = "[ \t]+".r
def comment: Parser[String] = """#.*""".r
def url: Parser[String] = """[A-Za-z0-9-\.\_\-]{1,65}(?<!-)\.+[A-Za-z]{2,7}""".r
def localhost: Parser[String] = """\b(\d{1,3}\.){3}\d{1,3}\b""".r
def pound: Parser[String] = "#".r
def port: Parser[String] = """:\d{3}""".r
def urlPort = url <~ port.?
def linePos = pound.? ~ localhost ~ urlPort ~ comment.? ^^ {
case p ~ _ ~ dns ~ com ⇒ LineParsed(p.getOrElse(""), dns, com.getOrElse(""))
}
def fullLine = rep("\n") ~> repsep(linePos, rep1("\n"))
}