我正在尝试使用解析器组合器,我经常遇到无限递归。这是我遇到的第一个:
import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader
class CombinatorParserTest extends Parsers {
type Elem = Char
def notComma = elem("not comma", _ != ',')
def notEndLine = elem("not end line", x => x != '\r' && x != '\n')
def text = rep(notComma | notEndLine)
}
object CombinatorParserTest {
def main(args:Array[String]): Unit = {
val p = new CombinatorParserTest()
val r = p.text(new CharSequenceReader(","))
// does not get here
println(r)
}
}
如何打印正在发生的事情?为什么没有完成?
答案 0 :(得分:4)
记录解析notComma
和notEndLine
的尝试表明它是文件结尾(在日志(...)(“mesg”)输出中显示为CTRL-Z) )正在被重复解析。以下是我为此目的修改解析器的方法:
def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))
我不完全确定发生了什么(我尝试了很多语法上的变化),但我认为它是这样的:EOF实际上不是人工引入输入流中的字符,而是一种永久性的输入结束时的条件。因此,这个从未消耗过的EOF伪字符被重复解析为“不是逗号或不是行尾”。
答案 1 :(得分:2)
好的,我想我已经弄明白了。 `CharSequenceReader返回'\ 032'作为输入结束的标记。因此,如果我像这样修改我的输入,它可以工作:
import util.parsing.combinator.Parsers
import util.parsing.input.CharSequenceReader
class CombinatorParserTest extends Parsers {
type Elem = Char
import CharSequenceReader.EofCh
def notComma = elem("not comma", x => x != ',' && x!=EofCh)
def notEndLine = elem("not end line", x => x != '\r' && x != '\n' && x!=EofCh)
//def text = rep(notComma | notEndLine)
def text = rep(log(notComma)("notComma") | log(notEndLine)("notEndLine"))
}
object CombinatorParserTest {
def main(args:Array[String]): Unit = {
val p = new CombinatorParserTest()
val r = p.text(new CharSequenceReader(","))
println(r)
}
}
请参阅CharSequenceReader
here的源代码。如果 scaladoc 提到它,它会为我节省很多时间。
答案 2 :(得分:0)
我发现日志记录功能非常难以输入。就像我为什么要做log(parser)("string")
一样?为什么不像parser.log("string")
那样简单?无论如何,为了克服这个问题,我做了这个:
trait Logging { self: Parsers =>
// Used to turn logging on or off
val debug: Boolean
// Much easier than having to wrap a parser with a log function and type a message
// i.e. log(someParser)("Message") vs someParser.log("Message")
implicit class Logged[+A](parser: Parser[A]) {
def log(msg: String): Parser[A] =
if (debug) self.log(parser)(msg) else parser
}
}
现在在你的解析器中,你可以像这样混合这个特性:
import scala.util.parsing.combinator.Parsers
import scala.util.parsing.input.CharSequenceReader
object CombinatorParserTest extends App with Parsers with Logging {
type Elem = Char
override val debug: Boolean = true
def notComma: Parser[Char] = elem("not comma", _ != ',')
def notEndLine: Parser[Char] = elem("not end line", x => x != '\r' && x != '\n')
def text: Parser[List[Char]] = rep(notComma.log("notComma") | notEndLine.log("notEndLine"))
val r = text(new CharSequenceReader(","))
println(r)
}
如果需要,您还可以覆盖debug
字段以关闭日志记录。
运行它还会显示第二个解析器正确解析了逗号:
trying notComma at scala.util.parsing.input.CharSequenceReader@506e6d5e
notComma --> [1.1] failure: not comma expected
,
^
trying notEndLine at scala.util.parsing.input.CharSequenceReader@506e6d5e
notEndLine --> [1.2] parsed: ,
trying notComma at scala.util.parsing.input.CharSequenceReader@15975490
notComma --> [1.2] failure: end of input
,
^
trying notEndLine at scala.util.parsing.input.CharSequenceReader@15975490
notEndLine --> [1.2] failure: end of input
,
^
The result is List(,)
Process finished with exit code 0