我正在尝试编写一些东西来解析我的Django模板,但是如果{% endblock %}
这是我到目前为止所拥有的
import Control.Monad
import Text.ParserCombinators.Parsec
data Piece = StaticPiece String
| BlockPiece String [Piece]
| VarPiece String
deriving (Show)
noWhitespace = many1 $ oneOf "_" <|> alphaNum
parseBlock = do
blockName <- between (string "{% block" >> spaces) (spaces >> string "%}") noWhitespace <?> "block tag"
blockContent <- many (parsePiece (void $ try $ (string "{% endblock %}")))
return $ BlockPiece blockName blockContent
parseVar = do
var <- between (string "{{" >> spaces) (spaces >> string "}}") noWhitespace <?> "variable"
return $ VarPiece var
parseStatic end = do
s <- manyTill (anyChar) $ end <|> (void $ lookAhead $ try $ parseNonStatic)
return $ StaticPiece s
parseNonStatic = try parseBlock <|> parseVar
parsePiece s = try parseNonStatic <|> (parseStatic s)
parsePieces = manyTill (parsePiece eof) eof
main :: IO ()
main = do
putStrLn "1"
print $ parse parsePieces "" "Blah blah blah"
putStrLn "2"
print $ parse parsePieces "" "{{ some_var }} string {{ other_var }} s"
putStrLn "3"
print $ parse parsePieces "" "{% block body %}{% endblock %}"
putStrLn "4"
print $ parse parsePieces "" "{% block body %}{{ hello }}{% endblock %}"
putStrLn "5"
print $ parse parsePieces "" "{% block body %}{% {% endblock %}"
putStrLn "6"
print $ parse parseBlock "" "{% block body %}{% endblock %} "
putStrLn "7"
print $ parse parsePieces "" "{% block body %} {} { {{ }{ {{{}} cool } {% block inner_body %} Hello: {{ hello }}{% endblock %} {% endblock %}"
putStrLn "8"
print $ parse parsePieces "" "{% block body %} {} {{ cool }} {% block inner_body %} Hello: {{ hello }}{% endblock %}{% endblock %} ldsakjf"
print ">>"
--
print $ parse parseBlock "" "{% block body %}{% endblock %} "
我在想,不管怎么说,不是从头到尾看字符串,而是以某种方式从头到尾看。如果你看#7 StaticPiece " "
在最里面的块里面,它应该在body
块中。任何帮助将不胜感激。
编辑上面的代码输出:
1
Right [StaticPiece "Blah blah blah"]
2
Right [VarPiece "some_var",StaticPiece " string ",VarPiece "other_var",StaticPiece " s"]
3
Right [BlockPiece "body" [StaticPiece ""]]
4
Right [BlockPiece "body" [VarPiece "hello",StaticPiece ""]]
5
Right [BlockPiece "body" [StaticPiece "{% "]]
6
Left (line 1, column 32):
unexpected end of input
expecting "{% endblock %}", block tag or variable
7
Right [BlockPiece "body" [StaticPiece " {} { {{ }{ {{{}} cool } ",BlockPiece "inner_body" [StaticPiece " Hello: ",VarPiece "hello",StaticPiece "",StaticPiece " "]]]
8
Right [StaticPiece "{% block body %} {} ",VarPiece "cool",StaticPiece " {% block inner_body %} Hello: ",VarPiece "hello",StaticPiece "{% endblock %}{% endblock %} ldsakjf"]
">>"
Left (line 1, column 32):
unexpected end of input
expecting "{% endblock %}", block tag or variable
答案 0 :(得分:2)
让我们重写一些解析器,让事情顺利进行。
首先,我们需要使用匹配{% something or other %}
的解析器,所以让它成为一个函数:
tag p = between (string "{%" >> spaces) (spaces >> string "%}") p <?> "tag"
ghci> parse (tag $ string "any parser here") "" "{% any parser here %}"
Right "any parser here"
让我们在manyTill
中使用parseBlock
来获取endblock标记。我仍在使用try
,因为tag (string "endblock")
可能无法读取某些输入,例如在变量或其他非标记的开头显示{
。
parseBlock = do
blockName <- tag (string "block" >> spaces >> noWhitespace) <?> "block tag"
blockContent <- manyTill parsePiece (try $ tag $ string "endblock")
return $ BlockPiece blockName blockContent
parseStatic
是这个解析器的大多数问题的根源 - 它允许除了tag或var之外的任何东西,这总是有问题的 - 解析器在遵循规则方面要比自由主义者好得多。
我们需要阻止parseStatic
只吃掉输入的其余部分,以便非静态解析器有机会再次尝试,所以让我们让一个解析器查看下一个字符而不用任何其他字符办法。使用像这样的单个字符可以避免大量的回溯,虽然我们稍后会看到有一些组合要做。
peekChar = void . try . lookAhead .char
parseStatic
也必须与空字符串不匹配 - 不允许将与空字符串匹配的解析器与任何many
组合器一起使用,因为它们将允许像{{1}这样的无限解析}。
这就是为什么我们会允许任何我们喜欢的角色(包括[StaticPiece "",StaticPiece "",StaticPiece ""..]
),然后我们喜欢的角色不是{
。 {
以外唯一可以终止{
的是输入的结尾,这就是为什么StaticPiece
被允许的原因。
eof
parseStatic = do
c <- anyChar
s <- manyTill anyChar (peekChar '{' <|> eof)
return $ StaticPiece (c:s)
所以我们得到
ghci> parse parseStatic "" "some stuff not containing { other stuff"
Right (StaticPiece "some stuff not containing ")
我们现在得到像
这样的好解析parsePieces = manyTill parsePiece eof
但还有像
这样的丑陋的人ghci> parse parsePieces "" "{{ some_var }} string {{ other_var }} s"
Right [VarPiece "some_var",StaticPiece " string ",VarPiece "other_var",StaticPiece " s"]
因为每次点击ghci> parse parsePieces "" "{% block body %} {} { {{ }{ {{{}} cool } {% block inner_body %} Hello: {{ hello }}{% endblock %} {% endblock %}"
Right [BlockPiece "body" [StaticPiece " ",StaticPiece "{} ",StaticPiece "{ ",StaticPiece "{",StaticPiece "{ }",StaticPiece "{ ",StaticPiece "{",StaticPiece "{",StaticPiece "{}} cool } ",BlockPiece "inner_body" [StaticPiece " Hello: ",VarPiece "hello"],StaticPiece " "]]
时parseStatic
都会停止。让我们将相邻的静态函数转换为具有一些辅助函数的静态函数:
{
我们将使用isStatic :: Piece -> Bool
isStatic (StaticPiece _) = True
isStatic _ = False
unStatic :: Piece -> String
unStatic (StaticPiece s) = s
unStatic _ = error "unStatic: applied to something other than a StaticPiece"
来收集非静力学并连接静力学:
span :: (a -> Bool) -> [a] -> ([a], [a])
并重写combineStatics :: [Piece] -> [Piece]
combineStatics pieces = let (nonstatics,therest) = span (not.isStatic) pieces in
nonstatics ++ combine therest where
combine [] = []
combine ps = let (statics,more) = span isStatic ps in
(StaticPiece . concat . map unStatic) statics : combineStatics more
以结合其块内容中的任何静态:
parseBlock
测试现在正如我想象的那样运行:
parseBlock = do
blockName <- tag (string "block" >> spaces >> noWhitespace) <?> "block tag"
blockContent <- manyTill parsePiece (try $ tag $ string "endblock")
return $ BlockPiece blockName (combineStatics blockContent)
答案 1 :(得分:0)
我想我明白了。
我更改了代码,以便parseBlock是消耗{% endblock %}
而不是parseStatic的代码。
parseBlockContent end =
manyTill (parsePiece (lookAhead $ try $ end)) (try $ end)
parseBlock = do
blockName <- parseTemplateTag (string "block") wordString <?> "block tag"
blockContent <- parseBlockContent (void $ string "{% endblock %}")
return $ BlockPiece blockName blockContent
拥有它会很好,所以它不需要回溯太多,特别是因为parseStatic必须使用整个{% block %} {% endblock %}
来判断它是否应该继续。