如果字符跟随我的字符串,Parsec无法解析

时间:2014-02-21 16:02:37

标签: haskell parsec

我正在尝试编写一些东西来解析我的Django模板,但是如果{% endblock %}

之后有任何内容,我的解析器就会失败

这是我到目前为止所拥有的

import Control.Monad
import Text.ParserCombinators.Parsec


data Piece = StaticPiece String 
           | BlockPiece String [Piece]
           | VarPiece String
  deriving (Show)

noWhitespace = many1 $ oneOf "_" <|> alphaNum

parseBlock = do
  blockName <- between (string "{% block" >> spaces) (spaces >> string "%}") noWhitespace <?> "block tag"
  blockContent <- many (parsePiece (void $ try $ (string "{% endblock %}")))
  return $ BlockPiece blockName blockContent

parseVar = do
  var <- between (string "{{" >> spaces) (spaces >> string "}}") noWhitespace <?> "variable"
  return $ VarPiece var

parseStatic end = do
  s <- manyTill (anyChar) $ end <|> (void $ lookAhead $ try $ parseNonStatic)
  return $ StaticPiece s 

parseNonStatic = try parseBlock <|> parseVar
parsePiece s = try parseNonStatic <|> (parseStatic s)

parsePieces = manyTill (parsePiece eof) eof

main :: IO ()
main = do
  putStrLn "1"
  print $ parse parsePieces "" "Blah blah blah"
  putStrLn "2"
  print $ parse parsePieces "" "{{ some_var }} string {{ other_var }} s"
  putStrLn "3"
  print $ parse parsePieces "" "{% block body %}{% endblock %}"
  putStrLn "4"
  print $ parse parsePieces "" "{% block body %}{{ hello }}{% endblock %}"
  putStrLn "5"
  print $ parse parsePieces "" "{% block body %}{% {% endblock %}"
  putStrLn "6"
  print $ parse parseBlock ""  "{% block body %}{% endblock %} "
  putStrLn "7"
  print $ parse parsePieces "" "{% block body %} {} { {{ }{ {{{}} cool } {% block inner_body %} Hello: {{ hello }}{% endblock %} {% endblock %}"
  putStrLn "8"
  print $ parse parsePieces "" "{% block body %} {} {{ cool }} {% block inner_body %} Hello: {{ hello }}{% endblock %}{% endblock %} ldsakjf"
  print ">>"
  --
  print $ parse parseBlock ""  "{% block body %}{% endblock %} "

我在想,不管怎么说,不是从头到尾看字符串,而是以某种方式从头到尾看。如果你看#7 StaticPiece " "在最里面的块里面,它应该在body块中。任何帮助将不胜感激。

编辑上面的代码输出:

1
Right [StaticPiece "Blah blah blah"]
2
Right [VarPiece "some_var",StaticPiece " string ",VarPiece "other_var",StaticPiece " s"]
3
Right [BlockPiece "body" [StaticPiece ""]]
4
Right [BlockPiece "body" [VarPiece "hello",StaticPiece ""]]
5
Right [BlockPiece "body" [StaticPiece "{% "]]
6
Left (line 1, column 32):
unexpected end of input
expecting "{% endblock %}", block tag or variable
7
Right [BlockPiece "body" [StaticPiece " {} { {{ }{ {{{}} cool } ",BlockPiece "inner_body" [StaticPiece " Hello: ",VarPiece "hello",StaticPiece "",StaticPiece " "]]]
8
Right [StaticPiece "{% block body %} {} ",VarPiece "cool",StaticPiece " {% block inner_body %} Hello: ",VarPiece "hello",StaticPiece "{% endblock %}{% endblock %} ldsakjf"]
">>"
Left (line 1, column 32):
unexpected end of input
expecting "{% endblock %}", block tag or variable

2 个答案:

答案 0 :(得分:2)

让我们重写一些解析器,让事情顺利进行。

使用manyTill解析具有匹配的endblock标记的块

首先,我们需要使用匹配{% something or other %}的解析器,所以让它成为一个函数:

tag p = between (string "{%" >> spaces) (spaces >> string "%}") p <?> "tag"
ghci> parse (tag $ string "any parser here") "" "{% any parser here %}"
Right "any parser here"

让我们在manyTill中使用parseBlock来获取endblock标记。我仍在使用try,因为tag (string "endblock")可能无法读取某些输入,例如在变量或其他非标记的开头显示{

parseBlock = do
  blockName <- tag (string "block" >> spaces >> noWhitespace) <?> "block tag"
  blockContent <- manyTill parsePiece (try $ tag $ string "endblock") 
  return $ BlockPiece blockName blockContent

parseStatic不能匹配任何内容,应暂停检查标签/变量

parseStatic是这个解析器的大多数问题的根源 - 它允许除了tag或var之外的任何东西,这总是有问题的 - 解析器在遵循规则方面要比自由主义者好得多。

我们需要阻止parseStatic只吃掉输入的其余部分,以便非静态解析器有机会再次尝试,所以让我们让一个解析器查看下一个字符而不用任何其他字符办法。使用像这样的单个字符可以避免大量的回溯,虽然我们稍后会看到有一些组合要做。

peekChar = void . try . lookAhead .char 

parseStatic也必须与空字符串不匹配 - 不允许将与空字符串匹配的解析器与任何many组合器一起使用,因为它们将允许像{{1}这样的无限解析}。 这就是为什么我们会允许任何我们喜欢的角色(包括[StaticPiece "",StaticPiece "",StaticPiece ""..]),然后我们喜欢的角色不是{{以外唯一可以终止{的是输入的结尾,这就是为什么StaticPiece被允许的原因。

eof
parseStatic = do
  c <- anyChar
  s <- manyTill anyChar (peekChar '{' <|> eof)
  return $ StaticPiece (c:s) 

所以我们得到

ghci> parse parseStatic "" "some stuff not containing { other stuff"
Right (StaticPiece "some stuff not containing ")

将这些静力学粘在一起

我们现在得到像

这样的好解析
parsePieces = manyTill parsePiece eof

但还有像

这样的丑陋的人
ghci> parse parsePieces "" "{{ some_var }} string {{ other_var }} s"
Right [VarPiece "some_var",StaticPiece " string ",VarPiece "other_var",StaticPiece " s"]

因为每次点击ghci> parse parsePieces "" "{% block body %} {} { {{ }{ {{{}} cool } {% block inner_body %} Hello: {{ hello }}{% endblock %} {% endblock %}" Right [BlockPiece "body" [StaticPiece " ",StaticPiece "{} ",StaticPiece "{ ",StaticPiece "{",StaticPiece "{ }",StaticPiece "{ ",StaticPiece "{",StaticPiece "{",StaticPiece "{}} cool } ",BlockPiece "inner_body" [StaticPiece " Hello: ",VarPiece "hello"],StaticPiece " "]] parseStatic都会停止。让我们将相邻的静态函数转换为具有一些辅助函数的静态函数:

{

我们将使用isStatic :: Piece -> Bool isStatic (StaticPiece _) = True isStatic _ = False unStatic :: Piece -> String unStatic (StaticPiece s) = s unStatic _ = error "unStatic: applied to something other than a StaticPiece" 来收集非静力学并连接静力学:

span :: (a -> Bool) -> [a] -> ([a], [a])

并重写combineStatics :: [Piece] -> [Piece] combineStatics pieces = let (nonstatics,therest) = span (not.isStatic) pieces in nonstatics ++ combine therest where combine [] = [] combine ps = let (statics,more) = span isStatic ps in (StaticPiece . concat . map unStatic) statics : combineStatics more 以结合其块内容中的任何静态:

parseBlock

现在效果很好

测试现在正如我想象的那样运行:

parseBlock = do
  blockName <- tag (string "block" >> spaces >> noWhitespace) <?> "block tag"
  blockContent <- manyTill parsePiece (try $ tag $ string "endblock")
  return $ BlockPiece blockName (combineStatics blockContent)

答案 1 :(得分:0)

我想我明白了。

我更改了代码,以便parseBlock是消耗{% endblock %}而不是parseStatic的代码。

parseBlockContent end = 
  manyTill (parsePiece (lookAhead $ try $ end)) (try $ end)

parseBlock = do
  blockName <- parseTemplateTag (string "block") wordString <?> "block tag"
  blockContent <- parseBlockContent (void $ string "{% endblock %}")
  return $ BlockPiece blockName blockContent

拥有它会很好,所以它不需要回溯太多,特别是因为parseStatic必须使用整个{% block %} {% endblock %}来判断它是否应该继续。