我遇到了parsec解析器的不明确行为,所以我想将字符串解析为相同的
> <CdId>
1
> <Mol Weight>
270.2369
> <Formula>
C15H10O5
> <LOG_ER_RBA>
-0.36
> <ACTIVITY>
1
我写了一个解析器
parseProperties = do
skipMany1 newline
char '>' >> spaces >> char '<'
propName <- many1 (noneOf ">")
char '>'
newline
propValue <- many1 (noneOf "\n")
return (propName,propValue)
这个解析器出色地解析了一个项目,并且还能够解析几个:
parseTest (count 5 parseProperties) "\n> <CdId>\n1\n\n> <Mol Weight>\n270.2369\n\n> <Formula>\nC15H10O5\n\n> <LOG_ER_RBA>\n-0.36\n\n> <ACTIVITY>\n1\n\n"
结果
[("CdId","1"),("Mol Weight","270.2369"),("Formula","C15H10O5"),("LOG_ER_RBA","-0.36"),("ACTIVITY","1")]
然而,我找不到解析随机数量属性的方法。如果我试试
parseTest (many1 parseProperties) "\n> <CdId>\n1\n\n> <Mol Weight>\n270.2369\n\n> <Formula>\nC15H10O5\n\n> <LOG_ER_RBA>\n-0.36\n\n> <ACTIVITY>\n1\n\n"
或
parseTest (manyTill parseProperties (try eof)) "\n> <CdId>\n1\n\n> <Mol Weight>\n270.2369\n\n> <Formula>\nC15H10O5\n\n> <LOG_ER_RBA>\n-0.36\n\n> <ACTIVITY>\n1\n\n"
解析器失败
parse error at (line 17, column 1):
unexpected end of input
expecting new-line or ">"
但是,如果我使用anyChar解析器,它就不会失败。
parseTest (manyTill anyChar (try eof)) "\n> <CdId>\n1\n\n> <Mol Weight>\n270.2369\n\n> <Formula>\nC15H10O5\n\n> <LOG_ER_RBA>\n-0.36\n\n> <ACTIVITY>\n1\n\n"
"\n> <CdId>\n1\n\n> <Mol Weight>\n270.2369\n\n> <Formula>\nC15H10O5\n\n> <LOG_ER_RBA>\n-0.36\n\n> <ACTIVITY>\n1\n\n"
答案 0 :(得分:2)
parseProperties
解析器在您的示例中执行多次,直到
遇到eof
。问题是parseProperties
不消耗
你的例子中的尾部空格,所以在解析了最后一个标签之后,
剩余字符串为"\n\n"
,不会触发您的终止
条件,因为它不是输入的结束。这会导致parseProperties
再次尝试,它消耗空白但在尝试时失败
吃'>'
。
尝试将parseTest
修改为以下
test = "\n> <CdId>\n1\n\n> <Mol Weight>\n270.2369\n\n> <Formula>\nC15H10O5\n\n> <LOG_ER_RBA>\n-0.36\n\n> <ACTIVITY>\n1\n\n"
parseTest (manyTill parseProperties $ try (skipMany newline >> eof)) test
这个trys在检查它是否结束之前剥离前面的空格 输入。
答案 1 :(得分:1)
如果“\ n”的数量是随机的,我会使用这个版本(而不是添加额外的解析器):
parseProperties :: Parser (String,String)
parseProperties = do
skipMany newline -- optional newline(s)
char '>' >> spaces >> char '<'
propName <- many1 (noneOf ">")
char '>'
newline
propValue <- many1 (noneOf "\n")
skipMany newline -- optional newline(s)
return (propName,propValue)
我试过这个版本:
parseTest (many1 parseProperties) "\n> <CdId>\n1\n\n> <Mol Weight>\n270.2369\n\n><Formula>\nC15H10O5\n\n> <LOG_ER_RBA>\n-0.36\n\n> <ACTIVITY>\n1\n\n"
得到了:
[("CdId","1"),("Mol Weight","270.2369"),("Formula","C15H10O5"),("LOG_ER_RBA","-0.36"), ("ACTIVITY","1")]