AST和Harsell中的解析

时间:2014-10-07 07:30:36

标签: parsing haskell abstract-syntax-tree

我有一项任务,我无法弄清楚如何定义答案。

作业

编写函数exp:: [String] -> (AST, [String])

AST

  • 如果x是一个数字,则应该说Number x
  • 如果是“+”og a“ - ”则应该说Atom x
  • 如果它读取“(”,那么“(”直到它出现“后面的所有内容)”应该是列表[AST]

以便输出为:

exp (token "(hi (4) 32)")
> (List [Atom "hi", List [Number 4], Number 32], [])

exp (token "(+ 3 42 654 2)") 
> (List [Atom "+", Number 3, Number 42, Number 654, Number 2], [])

exp (token "(+ 21 444) junk") 
> (List [Atom "+", Number 21, Number 444], ["junk"])

到目前为止我所拥有的

我已经有了一个令牌功能token :: String -> [String]来制作一个列表。

例:
`token "( + 2 ( + 2 3 ) )"
> ["(","+","2","(","+","2","3",")",")"]`

exp函数如下所示:

exp :: [String] -> (AST, [String])
exp [] = error "Empty list"
exp (x:xs)  | x == ")"      = error ""
            | x == "("      = let (e, ss') = exp xs in (List [getAst xs], ss')
            | x == "+"      = let (e, ss') = exp xs in (Atom (read x), ss')
            | x == "-"      = let (e, ss') = exp xs in (Atom (read x), ss')
            | otherwise     = exp xs`

getAst函数:

getAst :: [String] -> AST
getAst [] = error ""
getAst (x:xs)
            | x == ")"  = error ""
            | x == "("  = (List [getAst xs])
            | isAtom x  = (Atom x) 
            | isNum x   = (Number (read x))
            | otherwise = getAst xs`

(是的,我是Haskell的初学者......)

1 个答案:

答案 0 :(得分:6)

我想我可以尝试帮助你。

表示问题的方式你应该能够通过查看下一个来做到这一点 输入/令牌并从那里决定去哪里。

一些假设

数据表示为[String] -> (Ast, [String])的方式我认为它是一个常见的解析器,其中 解析器尝试读取输入的某些部分并将解析/转换的输出与其未转换的其余输入一起返回(因此只有元组的两个解析 - Ast和其余的输入)。

AST类型

因为你没有包含它我认为是:

data Ast
  = Number Int
  | Atom String
  | List [Ast]
  deriving Show

我需要的一些东西

我需要一些东西:

import Prelude hiding (exp)

import Control.Applicative ((<$>))
import Data.Maybe (fromJust, isJust)

我必须隐藏exp,因为我们希望将其用作函数名。

然后我希望fmap超过Maybe,所以我要包含来自Control.Applicative的运算符。 这真的就是这个,以防你以前没有看到它:

f <$> Nothing = Nothing
f <$> Just a  = Just (f a)

我想要Maybe的一些助手:

  • isJust检查是否Just _
  • fromJusta
  • 获取Just a

最后,我需要这个帮助函数read更安全一点:

tryRead :: (Read a) => String -> Maybe a
tryRead input =
  case readsPrec 0 input of
    (a,_):_ -> Just a
    _       -> Nothing

这会尝试在此处读取一个数字 - 如果n是数字则返回Just n,否则Nothing

第一次去

未完成首先解决您的问题:

exp :: [String] -> (Ast, [String])
exp (lookat:rest)
  | isJust number = (fromJust number, rest)
  | lookat == "("  = parseList rest []
  where number = Number <$> tryRead lookat

parseList :: [String] -> [Ast] -> (Ast, [String])
parseList inp@(lookat:rest) acc
  | lookat == ")" = (List (reverse acc), rest)
  | otherwise    = let (el, rest') = exp inp
                   in parseList rest' (el:acc)

正如你所看到的那样,我只是基于lookat进行分支,但稍微扭曲了一下:

如果我看到一个数字,我会返回数字和rest-token-list。 如果我看到(,我会启动另一个解析器parseList

parseList也会这样做:   - 它查看第一个令牌   - 如果令牌是)它完成当前列表(它使用累加器技术)并返回。   - 如果不是,它使用现有的exp解析器递归获取列表的元素。

以下是一个示例运行:

λ> let input = ["(", "2", "(", "3", "4", ")", "5", ")"]

λ> exp input
(List [Number 2,List [Number 3,Number 4],Number 5],[])

TODO

还有一些边界情况你必须决定(如果没有输入令牌怎么办?)。

当然,您必须添加Atom s的案例 - 以完成此优惠。

完整解决方案

好的 - 3小时后,OP没有再次办理登机手续,所以我想我可以发布一个完整的解决方案。 我希望我没有忘记任何边缘情况,这肯定不是最有效的实现(tokens浮现在脑海中) - 但是OP给出了所有匹配的例子:

module Ast where

import Prelude hiding (exp)

import Control.Applicative ((<$>))
import Data.Char (isSpace, isControl)
import Data.Maybe (fromJust, isJust)

data Ast
  = Number Int
  | Atom String
  | List [Ast]
  | Empty
  deriving Show

type Token = String

main :: IO ()
main = do
  print $ parse "(hi (4) 32)"
  print $ parse "(+ 3 42 654 2)"
  print $ parseAst . tokens $ "(+ 21 444) junk"

parse :: String -> Ast
parse = fst . parseAst . tokens

parseAst :: [Token] -> (Ast, [Token])
parseAst [] = (Empty, [])
parseAst (lookat:rest)
  | isJust number = (fromJust number, rest)
  | lookat == "("  = parseList rest []
  | otherwise     = (Atom lookat, rest)
  where number = Number <$> tryRead lookat

parseList :: [Token] -> [Ast] -> (Ast, [Token])
parseList [] _ = error "Syntax error: `)` not found"
parseList inp@(lookat:rest) acc
  | lookat == ")" = (List (reverse acc), rest)
  | otherwise    = let (el, rest') = parseAst inp
                   in parseList rest' (el:acc)
tokens :: String -> [Token]
tokens = split ""
  where split tok "" = add tok []
        split tok (c:cs)
          | c == '(' || c == ')' = add tok $ [c] : split "" cs
          | isSpace c || isControl c = add tok $ split "" cs
          | otherwise = split (tok ++ [c]) cs
        add "" tks = tks
        add t tks =  t : tks

tryRead :: (Read a) => Token -> Maybe a
tryRead input =
  case readsPrec 0 input of
    (a,_):_ -> Just a
    _       -> Nothing

示例运行

λ> :main
List [Atom "hi",List [Number 4],Number 32]
List [Atom "+",Number 3,Number 42,Number 654,Number 2]
(List [Atom "+",Number 21,Number 444],["junk"])