在Haskell中创建唯一标签

时间:2011-06-10 20:03:09

标签: haskell code-generation compiler-construction

我正在为Haskell中的简单命令式语言编写一个编译器,输出Java字节码。我已经到了我发出字节码的抽象表示的地步。

在编写用于编译if语句的代码时遇到了一些麻烦。要实现if语句,我需要跳转到标签。因此,我需要为该标签生成一个名称,该名称必须是唯一的。

我的第一个想法是通过compileStatement来处理一些状态,即

compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]

当然,compilerStatement是递归的,所以使用这种方法需要我从递归调用中将唯一ID生成器的状态传递回upp:

compileStatement :: Statement -> UniqueIDState -> (UniqueIdState, [AbstractInstruction])

这似乎有点笨拙,特别是如果我意识到我需要在将来携带更多状态;有更优雅的方式吗?

3 个答案:

答案 0 :(得分:5)

您需要“独特的供应”。在Haskell中执行此操作的常用方法是通过状态monad线程计数器,这会自动化您描述的管道问题。

答案 1 :(得分:4)

正如所说,你可以使用State Monad。或者您可以使用“Unique Supply Monad”。

http://www.haskell.org/ghc//docs/6.10.3/html/libraries/ghc/UniqSupply.html#3

它只是一个国家Moand的包装:

http://en.wikibooks.org/wiki/Haskell/Practical_monads#Make_a_monad_over_state

答案 2 :(得分:2)

如果您仅有的工具是锤子,我想这很诱人,就像对待钉子一样对待一切。

亚伯拉罕·马斯洛。

有什么不一样的地方-一个唯一的供应商,它不是类的成员Monad。碰巧的是,您几乎可以使用原始类型签名了:

compileStatement :: Statement -> UniqueIDState -> [AbstractInstruction]

如果仅 的要求是每个标签都是唯一的-无需计算使用了多少标签,在相同情况下也可以提供相同的标识符,等等-您可以使用一种侵入性较小的技术

摘自John Launchbury和Simon Peyton Jones的State in Haskell第39-40页:

newUniqueSupply :: IO UniqueSupply
splitUniqueSupply :: UniqueSupply -> (UniqueSupply, UniqueSupply)
getUnique  :: UniqueSupply -> Unique

instance Eq Unique
instance Ord Unique
instance Text Unique

--    interface     --
-- ================ --
--  implementation  --

data UniqueSupply = US Unique UniqueSupply UniqueSupply

type Unique = Int

 -- where all the action happens!
newUniqueSupply :: IO UniqueSupply
newUniqueSupply
  = newVar 0             `thenST` \ uvar ->
    let
      next :: IO Unique
      next = interleaveST (
                readVar uvar         `thenST` \ u ->
                writeVar uvar (u+1)  `thenST_`
                returnStrictlyST u
              )

      supply :: IO UniqueSupply
      supply = interleaveST (
                 next         `thenST` \ u ->
                 supply       `thenST` \ s1 ->
                 supply       `thenST` \ s2 ->
                 returnST (US u s1 s2)
                )

    in
    supply

 -- bits so boring they're not even in the paper...
splitUniqueSupply (US _ s1 s2) = (s1, s2)
getUnique (US u _ _) = u

是的... 1996年的Haskell样本-让我们重新整理一下吧:

module UniqueSupply(
    Unique, UniqueSupply,
    newUniqueSupply, splitUniqueSupply, getUnique
) where

import Control.Monad    (liftM3)
import Data.IORef       (IORef, newIORef, atomicModifyIORef)
import System.IO.Unsafe (unsafeInterleaveIO)

newtype Unique    =  U Int deriving (Eq, Ord, Read, Show)

data UniqueSupply =  US Unique UniqueSupply UniqueSupply

newUniqueSupply   :: IO UniqueSupply
newUniqueSupply
  = do uvar <- newIORef 0
       let next   :: IO Unique
           next   =  unsafeInterleaveIO (atomicModifyIORef uvar (\u -> (u+1, U u)))

           supply :: IO UniqueSupply
           supply =  unsafeInterleaveIO (liftM3 US next supply supply)
       supply

splitUniqueSupply :: UniqueSupply -> (UniqueSupply, UniqueSupply)
splitUniqueSupply (US _ s1 s2) =  (s1, s2)

getUnique :: UniqueSupply -> Unique
getUnique (US u _ _) =  u

现在它又可以工作了,有一些烦恼要处理:

  • 使用两种类型;

  • 缺乏多态性;

  • 固定的生成方式;

  • 错误重用的可能性。

最后一点特别有趣。假设:

data Statement =
    ... | If Statement Statement Statement | ...

然后,如果:

compileStatement (If c t e) s =
    case splitUniqueSupply s of
      (s1, s2) -> case splitUniqueSupply s2 of
                    (s3, s4) -> buildCondJump (compileStatement c s1)
                                              (compileStatement t s3)
                                              (compileStatement e s4)

错误地更改为:

compileStatement (If c t e) s =
    case splitUniqueSupply s of
      (s1, s2) -> case splitUniqueSupply s2 of
                    (s3, s4) -> buildCondJump (compileStatement c s)
                                              (compileStatement t s)
                                              (compileStatement e s)

不仅错误地重用了UniqueSupplyUnique的值,而且如果对compileStatement的任何递归调用都大量使用供应,则存在空间泄漏的可能性。

我们现在考虑第二点:缺乏多态性。假设存在合适的类型:

data Fresh a =  Fresh a (Fresh a) (Fresh a)

freshNew     :: ... -> IO (Fresh a)

splitFresh   :: Fresh a -> (Fresh a, Fresh a)

pluckFresh   :: Fresh a -> a

这意味着:

instance Functor Fresh where
    fmap h (Fresh u s1 s2) =  Fresh (h u) (fmap h s1) (fmap h s2)

然后启发:

freshNew     :: (Int -> a) -> IO (Fresh a)
freshNew g   =  fmap (fmap g) freshInts

splitFresh   :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh _ s1 s2) =  (s1, s2)

pluckFresh   :: Fresh a -> a
pluckFresh (Fresh u _ _) =  u

然后我们可以将freshInts设为私有:

freshInts    :: IO (Fresh Int)
freshInts    =  do uvar <- newIORef 0 
                   let incr u =  (u+1, u)
                       next   =  unsafeInterleaveIO $
                                 atomicModifyIORef uvar incr
                       supply =  unsafeInterleaveIO $
                                 liftM3 Fresh next supply supply
                   supply

如果用户仅需要Int值:

do            .
              .
              .
   int_supply <- freshNew id  {- id x = x -}
              .
              .
              .

作为奖励,它还补救了两种类型的使用以及固定的生成方式(第一点和第三点)。 Fresh新模块的时间:

module Fresh(
    Fresh,
    freshNew, splitFresh, pluckFresh
) where

import Control.Monad    (liftM3)
import Data.IORef       (IORef, newIORef, atomicModifyIORef)
import System.IO.Unsafe (unsafeInterleaveIO)

data Fresh a =  Fresh a (Fresh a) (Fresh a)

instance Functor Fresh where
    fmap h (Fresh u s1 s2) =  Fresh (h u) (fmap h s1) (fmap h s2)

freshNew     :: (Int -> a) -> IO (Fresh a)
freshNew g   =  fmap (fmap g) freshInts

splitFresh   :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh _ s1 s2) =  (s1, s2)

pluckFresh   :: Fresh a -> a
pluckFresh (Fresh u _ _) =  u

-- local definition
freshInts    :: IO (Fresh Int)
freshInts    =  do uvar <- newIORef 0 
                   let incr u =  (u+1, u)
                       next   =  unsafeInterleaveIO $
                                 atomicModifyIORef uvar incr
                       supply =  unsafeInterleaveIO $
                                 liftM3 Fresh next supply supply
                   supply

现在要进行重用之谜,并尝试尝试一个答案:

freshNew     :: (Int -> a) -> IO (Fresh a)
freshNew g   =  do z <- newIORef ()
                   fmap (fmap g) (freshInts z)

freshInts    :: IORef () -> IO (Fresh Int)
freshInts z  =  do let using   :: () -> (a, ())
                       using x =  (error "already used!", x)
                   () <- atomicModifyIORef z using
                   z1 <- newIORef ()
                   z2 <- newIORef ()
                   let incr u =  (u+1, u)
                       next   =  unsafeInterleaveIO $
                                 atomicModifyIORef uvar incr
                       supply =  unsafeInterleaveIO $
                                 liftM3 Fresh next (freshInts z1) (freshInts z2)
                   supply

...是的,那是一个原型好吗-我们可以做得更好吗?

到目前为止,splitFreshpluckFresh都是微不足道的:

splitFresh   :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh _ s1 s2) =  (s1, s2)

pluckFresh   :: Fresh a -> a
pluckFresh (Fresh u _ _) =  u

现在可以将某些作品freshIntsfreshNew转移给他们吗?

  • 如果splitFresh可以直接生成一对子树,则Fresh的值会更简单:

     data Fresh a =  Fresh ... {- no subtrees -}
    
     splitFresh :: Fresh a -> (Fresh a, Fresh a)
     splitFresh (Fresh g ...) =  (Fresh ..., Fresh ...)
    
  • 如果pluckFresh可以访问生成器功能-g中的freshNew,则可以直接提供所需的唯一值:

     data Fresh a =  Fresh (... -> a) ...
    
     pluckFresh :: Fresh a -> a
     pluckFresh (Fresh g ...) =  (g ...)
    

那又怎么样呢?

data Fresh a =  Fresh (Int -> a) U

splitFresh   :: Fresh a -> (Fresh a, Fresh a)
splitFresh (Fresh g n) =  (Fresh g n1, Fresh g n2) where 
                          (n1, n2) =  splitU n
   
pluckFresh   :: Fresh a -> a
pluckFresh (Fresh g n) =  g (intOfU n)

其中:

splitU :: U -> (U, U)
intOfU :: U -> Int

可以使freshInts如此简单:

freshInts    :: IO (Fresh Int)
freshInts    =  do n <- initialU
                   return (Fresh (\x -> x) n)

假设:

initialU :: IO U

嗯-freshInts的定义不太正确。然后是intOfU-它与其他地方看到的东西很奇怪……

[...]在命令式程序中,您可以简单地为每个标识符调用GenSym(),从全局供应中分配唯一的名称,并对供应产生副作用,以便随后对GenSym()将带来新的价值。

(摘自Launchbury和Peyton-Jones论文的第39页。)

让我们多加思考:

  • genSym由于其(与Haskell外部)与全局(可变)供应源之间的交互,因此其类型为:

    genSym :: IO Int
    

    以防止在纯上下文中使用它。

  • 早期的原型具有自己的外部交互功能-它使用可变的引用:

    freshInts :: IORef () -> IO (Fresh Int)
    

    防止重复使用Fresh值。

...带有抽象的IO类型,通过两种类型签名中的出现来指示那些外部交互的存在。

如果我们(谨慎地!)假设:

  • intOfUgenSym的伪装;

  • U类型也可以作为外界互动的指标;

即:

type U =  OI
genSym :: OI -> Int

intOfU :: U -> Int
intOfU =  ... $ genSym

...这意味着:

data Fresh a =  Fresh (Int -> a) OI

splitU       :: OI -> (OI, OI)
   

这看起来很有希望-我们现在可以将genSym移至freshInts

data Fresh a =  Fresh (OI -> a) OI

pluckFresh   :: Fresh a -> a
pluckFresh (Fresh g n) =  g n

freshInts    :: IO (Fresh Int)
freshInts    =  do n <- initialU
                   uvar <- newIORef 0
                   let incr n =  (n + 1, n)

                       genSym :: IO Int
                       genSym =  atomicModifyIORef uvar incr
                            
                       intOfU :: OI -> Int
                       intOfU =  ... $ genSym

                   return (Fresh intOfU n)

这看起来更明智-其他问题呢?

instance Functor Fresh where
    fmap f (Fresh g n) =  Fresh (f . g) n

freshNew     :: (Int -> a) -> IO (Fresh a)
freshNew g   =  do n <- initialU
                   uvar <- newIORef 0
                   let incr n =  (n + 1, n)

                       genSym :: IO Int
                       genSym =  atomicModifyIORef uvar incr
                            
                       intOfU :: OI -> Int
                       intOfU =  ... $ genSym

                   return (Fresh (g . intOfU) n)

这看起来非常有前景-我们不再需要本地定义freshInts!我们只需要定义 U OIinitialUsplitU-,就需要考虑一些事项:

  • 还记得Fresh中错误地重用compileStatement值的问题吗?好吧,这些OI值也存在相同的问题:

    pourFresh  :: Fresh a -> [a]
    pourFresh (Fresh g n) = map g (pourU n)
    
    pourU      :: OI -> [OI]
    pourU n    =  n1 : pourU n1 where (n1, n2) = splitU n
    

    OI类型的构造函数的现成可用性将使该问题更加严重。

  • 我们仍然假设此OI类型表明存在外部互动-很像这种简单的IO ...

这表明OI类型应该是抽象的。当我们正在处理原型并且您可能已经在使用它时,最简单的选择也许就是根据需要使用 Glaskell GHC扩展。

深呼吸的时间,并进行了一些更改:

  -- for GHC 8.6.5
{-# LANGUAGE MagicHash, UnboxedTuples #-}
import Data.Char  (isSpace)
import Data.IORef (IORef, newIORef, atomicModifyIORef)
import Prelude    (Int, String, Eq(..), Functor(..), Num(..))
import Prelude    ((.), ($), (++), error, all)
import GHC.Base   (IO(..), State#, MutVar#, RealWorld)
import GHC.Base   (seq#, newMutVar#, atomicModifyMutVar#, noDuplicate#)

data OI                 =  OI OI#

type OI#                =  String -> State# RealWorld

type IO# a              =  State# RealWorld -> (# State# RealWorld, a #)


part#                   :: OI# -> (# OI#, OI# #)
part# h                 =  case h "partOI" of
                             s -> case dispense# s of
                                    (# s', h1 #) ->
                                      case dispense# s' of
                                        (# _, h2 #) -> (# h1, h2 #)

dispense#               :: IO# OI#
dispense# s             =  case newMutVar# () s of
                             (# s', r #) -> (# s', expire# s' r #)

expire#                 :: State# s -> MutVar# s () -> String -> State# s
expire# s r name        =  case atomicModifyMutVar# r use s of
                             (# s', () #) -> s'
                           where
                               use x   =  (error nowUsed, x)
                               nowUsed =  name' ++ ": already expired"
                               name'   =  if all isSpace name then "(unknown)"
                                          else name

invokes#                :: Monomo a => String -> IO# a -> OI# -> a
(name `invokes#` act) h =  case act (noDuplicate# (h name)) of (# _, t #) -> t

class Monomo a

您现在可以再次开始呼吸;这里也有一些名称更改:

partFresh               :: Fresh a -> (Fresh a, Fresh a)
partFresh (Fresh g u)   =  case partOI u of
                             (u1, u2) -> (Fresh g u1, Fresh g u2)
   
pluckFresh              :: Fresh a -> a
pluckFresh (Fresh g u)  =  g u

freshNew                :: (Int -> a) -> IO (Fresh a)
freshNew g              =  do uvar <- newIORef 0
                              let incr n =  (n + 1, n)

                                  genSym :: IO Int
                                  genSym =  atomicModifyIORef uvar incr
                            
                                  gensym :: OI -> Int
                                  gensym =  "gensym" `invokes` genSym

                              runOI (Fresh (g . gensym))

instance Monomo Int

因此,您已经拥有了:一个简单的唯一电源(除了一个定义)是不含monad的:

  • 您只需要定义例如:

    nextID :: Int -> ID
    
  • 对类型签名的最终更改是适度的:

    compileStatement :: Statement -> Fresh ID -> [AbstractInstruction]
    

但是,如果您确实需要 do,则可以使用Fresh作为单子类型的基础,例如:

type Supply i a = Fresh i -> a

unit     :: a -> Supply i a
unit x   =  \u -> partFresh u `seq` x

bind     :: Supply i a -> (a -> Supply i b) -> Supply i b
bind m k =  \u -> case partFresh u of (u1, u2) -> (\x -> x `seq` k x u2) (m u1)

其中:

 -- for GHC 8.6.5
{-# LANGUAGE CPP #-}
#define during seq
import qualified Prelude(during)

{-# NOINLINE seq #-}
infixr 0 `seq`
seq     :: a -> b -> b
seq x y = Prelude.during x (case x of _ -> y)

或:

 -- for GHC 8.6.5
{-# LANGUAGE CPP #-}
#define during seq
import qualified Prelude(during)
import GHC.Base(lazy)

infixr 0 `seq`
seq     :: a -> b -> b
seq x y = Prelude.during x (lazy y)

...因为Prelude.seq isn't actually sequential

(是的:这些定义是特定于GHC的;对于其他Haskell实现,最简单的选择可能是添加一个新的原语。对于扩展本身,它们与每个定义一起使用。)

嗯...很有趣:

  -- for GHC 8.6.5
{-# LANGUAGE MagicHash, UnboxedTuples #-}
module Fresh(
    Fresh,
    freshNew, partFresh, pluckFresh
) where
import Data.Char  (isSpace)
import Data.IORef (IORef, newIORef, atomicModifyIORef)
import Prelude    (Int, String, Eq(..), Functor(..), Num(..))
import Prelude    ((.), ($), (++), error, all)
import GHC.Base   (IO(..), State#, MutVar#, RealWorld)
import GHC.Base   (seq#, newMutVar#, atomicModifyMutVar#, noDuplicate#)

partFresh               :: Fresh a -> (Fresh a, Fresh a)
partFresh (Fresh g u)   =  case partOI u of
                             (u1, u2) -> (Fresh g u1, Fresh g u2)
   
pluckFresh              :: Fresh a -> a
pluckFresh (Fresh g u)  =  g u

freshNew                :: (Int -> a) -> IO (Fresh a)
freshNew g              =  do uvar <- newIORef 0
                              let incr n =  (n + 1, n)

                                  genSym :: IO Int
                                  genSym =  atomicModifyIORef uvar incr
                            
                                  gensym :: OI -> Int
                                  gensym =  "gensym" `invokes` genSym

                              runOI (Fresh (g . gensym))

instance Functor Fresh where
    fmap f (Fresh g n) =  Fresh (f . g) n

-- local definitions --
data Fresh a            =  Fresh (OI -> a) OI

partOI                  :: OI -> (OI, OI)
partOI (OI h)           =  case part# h of (# h1, h2 #) -> (OI h1, OI h2)

runOI                   :: (OI -> a) -> IO a
runOI g                 =  IO $ \s -> case dispense# s of
                                        (# s', h #) -> seq# (g (OI h)) s'

invokes                 :: Monomo a => String -> IO a -> OI -> a
(name `invokes` IO act) (OI h)
                        =  (name `invokes#` act) h

class Monomo a

 -- extended definitions --
data OI                 =  OI OI#

type OI#                =  String -> State# RealWorld

type IO# a              =  State# RealWorld -> (# State# RealWorld, a #)


part#                   :: OI# -> (# OI#, OI# #)
part# h                 =  case h "partOI" of
                             s -> case dispense# s of
                                    (# s', h1 #) ->
                                      case dispense# s' of
                                        (# _, h2 #) -> (# h1, h2 #)

dispense#               :: IO# OI#
dispense# s             =  case newMutVar# () s of
                             (# s', r #) -> (# s', expire# s' r #)

expire#                 :: State# s -> MutVar# s () -> String -> State# s
expire# s r name        =  case atomicModifyMutVar# r use s of
                             (# s', () #) -> s'
                           where
                               use x   =  (error nowUsed, x)
                               nowUsed =  name' ++ ": already expired"
                               name'   =  if all isSpace name then "(unknown)"
                                          else name

invokes#                :: Monomo a => String -> IO# a -> OI# -> a
(name `invokes#` act) h =  case act (noDuplicate# (h name)) of (# _, t #) -> t

 -- supplemental instances --
instance Monomo Int

...我们甚至设法摆脱了unsafe...的定义-很好!

P.S:如果您想知道特殊的Monomo类,可以在Standard ML ...的历史中找到线索...