Parsec解析器在many1和manyTill组合器上失败

时间:2013-07-25 13:30:10

标签: haskell parsec

我遇到了parsec解析器的不明确行为,所以我想将字符串解析为相同的

>  <CdId>
1

>  <Mol Weight>
270.2369

>  <Formula>
C15H10O5

>  <LOG_ER_RBA>
-0.36

>  <ACTIVITY>
1

我写了一个解析器

 parseProperties = do       
        skipMany1 newline
        char '>' >> spaces >> char '<' 
        propName <- many1 (noneOf ">")
        char '>'
        newline
        propValue <- many1 (noneOf "\n")
        return (propName,propValue)

这个解析器出色地解析了一个项目,并且还能够解析几个:

parseTest (count 5 parseProperties) "\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n>  <Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n"

结果

 [("CdId","1"),("Mol Weight","270.2369"),("Formula","C15H10O5"),("LOG_ER_RBA","-0.36"),("ACTIVITY","1")]

然而,我找不到解析随机数量属性的方法。如果我试试

parseTest (many1 parseProperties) "\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n>  <Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n"

parseTest (manyTill parseProperties (try eof)) "\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n>  <Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n"

解析器失败

parse error at (line 17, column 1):
unexpected end of input
expecting new-line or ">"

但是,如果我使用anyChar解析器,它就不会失败。

parseTest (manyTill anyChar (try eof)) "\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n>  <Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n"

"\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n>  <Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n"

2 个答案:

答案 0 :(得分:2)

parseProperties解析器在您的示例中执行多次,直到 遇到eof。问题是parseProperties不消耗 你的例子中的尾部空格,所以在解析了最后一个标签之后, 剩余字符串为"\n\n",不会触发您的终止 条件,因为它不是输入的结束。这会导致parseProperties 再次尝试,它消耗空白但在尝试时失败 吃'>'

尝试将parseTest修改为以下

test = "\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n>  <Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n"

parseTest (manyTill parseProperties $ try (skipMany newline >> eof)) test

这个trys在检查它是否结束之前剥离前面的空格 输入。

答案 1 :(得分:1)

如果“\ n”的数量是随机的,我会使用这个版本(而不是添加额外的解析器):

parseProperties :: Parser (String,String)
parseProperties = do
  skipMany newline -- optional newline(s)
  char '>' >> spaces >> char '<'
  propName <- many1 (noneOf ">")
  char '>'
  newline
  propValue <- many1 (noneOf "\n")
  skipMany newline  -- optional newline(s)
  return (propName,propValue)

我试过这个版本:

 parseTest (many1 parseProperties) "\n>  <CdId>\n1\n\n>  <Mol Weight>\n270.2369\n\n><Formula>\nC15H10O5\n\n>  <LOG_ER_RBA>\n-0.36\n\n>  <ACTIVITY>\n1\n\n" 

得到了:

[("CdId","1"),("Mol Weight","270.2369"),("Formula","C15H10O5"),("LOG_ER_RBA","-0.36"),     ("ACTIVITY","1")]