这个无上下文的语言枚举器的伪代码实现是什么?

时间:2014-01-09 04:50:17

标签: algorithm haskell language-agnostic grammar context-free-grammar

Haskell中的

This blog post解释了如何借助称为Omega的monad来枚举无上下文语法。

我无法理解这是如何工作的部分原因是由于缺乏关于monad如何工作的解释,但主要是因为我无法理解monad。什么是该算法的正确伪代码解释,没有monads

使用类似于简单的通用语言(如JavaScript或Python)的语法将是首选。

1 个答案:

答案 0 :(得分:1)

这是没有monad的Haskell版本。我确实使用列表推导但那些 更直观,你也可以在Python中使用它们。

Omega类型只是[]的包装,但它有助于保持“符号字符串”和“可能的字符串列表”概念分开。由于我们不打算将Omega用于“可能的字符串列表”,因此我们使用newtype包装符号来表示“符号字符串”以保持一切正确:

import Prelude hiding (String)

-- represent a sequence of symbols of type `a`,
-- i.e. a string recognised by a grammar over `a`
newtype String a = String [a]
    deriving Show

-- simple wrapper for (++) to also make things more explicit when we use it
joinStrings (String a1) (String a2) = String (a1 ++ a2)

以下是博客帖子中的Symbol类型:

data Symbol a
    = Terminal a
    | Nonterminal [[Symbol a]] -- a disjunction of juxtapositions

Omega monad的核心实际上是diagonal函数:

-- | This is the hinge algorithm of the Omega monad,
-- exposed because it can be useful on its own.  Joins 
-- a list of lists with the property that for every x y 
-- there is an n such that @xs !! x !! y == diagonal xs !! n@.
diagonal :: [[a]] -> [a]

鉴于此,来自博客帖子的enumerate是:

enumerate :: Symbol a -> Omega [a]
enumerate (Terminal a) = return [a]
enumerate (Nonterminal alts) = do
    alt <- each alts          -- for each alternative
      -- (each is the Omega constructor :: [a] -> Omega a)
    rep <- mapM enumerate alt -- enumerate each symbol in the sequence
    return $ concat rep       -- and concatenate the results

我们的enumerate将具有以下类型:

enumerate :: Symbol a -> [String a]

Terminal案例很简单:

enumerate (Terminal a) = [String [a]]

Nonterminal案例中,每个备选方案的辅助函数都会 很有用:

-- Enumerate the strings accepted by a sequence of symbols
enumerateSymbols :: [Symbol a] -> [String a]

基本情况非​​常简单,但结果不是[],而是 包含空字符串的单例结果:

enumerateSymbols [] = [String []]

对于非空案例,另一个助手将对配对很有用 从头部和尾部的所有可能的方式, 使用diagonal

crossProduct :: [a] -> [b] -> [(a, b)]
crossProduct as bs = diagonal [[(a, b) | b <- bs] | a <- as]

我也可以写[[(a, b) | a <- as] | b <- bs]但是 我选择了另一个因为最终复制了输出 博客文章。

现在我们可以为enumerateSymbols编写非空案例:

enumerateSymbols (sym:syms) =
    let prefixes = enumerate sym
        suffixes = enumerateSymbols syms
    in [joinStrings prefix suffix 
           | (prefix, suffix) <- crossProduct prefixes suffixes]

现在是enumerate的非空案例:

enumerate (Nonterminal alts) =
    -- get the list of strings for each of the alternatives
    let choices = map enumerateSymbols alts
    -- and use diagonal to combine them in a "fair" way
    in diagonal choices

以下是来自Omega来源的diagonal的身体,以及我的身体 说明:

diagonal = diagonal' 0
    where

    -- strip n xss returns two lists,
    -- the first containing the head of each of the first n lists in xss,
    -- the second containing the tail of the first n lists in xss
    -- and all of the remaining lists in xss.
    -- empty lists in xss are ignored
    stripe 0 xss          = ([],xss)
    stripe n []           = ([],[])
    stripe n ([]:xss)     = stripe n xss
    stripe n ((x:xs):xss) = 
        let (nstripe, nlists) = stripe (n-1) xss
        in (x:nstripe, xs:nlists)


    -- diagonal' n xss uses stripe n to split up
    -- xss into a chunk of n elements representing the
    -- nth diagonal of the original input, and the rest
    -- of the original input for a recursive call to
    -- diagonal' (n+1)

    diagonal' _ [] = []
    diagonal' n xss =
        let (str, xss') = stripe n xss
        in str ++ diagonal' (n+1) xss'

关于对角化和广度优先搜索无限结构的一般概念,也值得一读this paper