类型构造函数的名称可以多长时间?

时间:2014-06-20 21:25:20

标签: haskell

构造函数名称的长度是否有限制?拥有荒谬的构造函数名称有什么后果?

data

1 个答案:

答案 0 :(得分:28)

如果我们检查source for ghc,我们可以找到用于定义数据构造函数的类型。 It is named DataCon,它有以下字段:

dcName    :: Name,  -- This is the name of the *source data con*

走下兔子洞,Name contains an OccName

n_occ  :: !OccName,     -- Its occurrence name

OccName包含FastString名称:

data OccName = OccName
    { occNameSpace :: !NameSpace
    , occNameFS :: !FastString
    }
    deriving Typeable

最后,FastString只是一个ByteString,也有一个预先计算的长度,还有一个int来标记它以便快速比较:

data FastString = FastString {
      uniq :: {-# UNPACK #-} !Int, -- unique id
      n_chars :: {-# UNPACK #-} !Int, -- number of chars
      fs_bs :: {-# UNPACK #-} !ByteString,
      fs_ref :: {-# UNPACK #-} !(IORef (Maybe FastZString))
  } deriving Typeable

使用此数据类型的字符串大小没有限制(显然maxBound :: Int除外)。但是,这并不排除代码中可能导致问题的其他地方的错误。

所以我们需要一个程序来测试它:

{-# LANGUAGE BangPatterns #-}
{-# LANGUAGE TemplateHaskell #-}
module Main where
import Control.Applicative ((<$>))
import Control.Monad (forM_)
import System.IO (hPutStr, hFileSize, hClose)
import System.Exit (ExitCode(..))
import System.IO.Temp (withSystemTempFile)
import Data.Time.Clock.POSIX (getPOSIXTime)
import System.Process (readProcessWithExitCode)

-- timing functions (from criterion)
getTime :: IO Double
getTime = (fromRational . toRational) `fmap` getPOSIXTime

time :: IO a -> IO (Double, a)
time act = do
  start <- getTime
  result <- act
  end <- getTime
  let !delta = end - start
  return (delta, result)



-- make a constructor like
-- data C = FFFFFF
makeConstructor :: Int -> String
makeConstructor size = "data C = " ++ replicate size 'F'

wrapWithMainModule :: String -> String
wrapWithMainModule code = unlines ["module Main where", "main = return ()", code]

data CompileResults = CompileResults {
  timeTaken :: Double,
  success :: Bool,
  outputFileSize :: Integer
  } deriving (Show)



compileHsCode :: String -> IO CompileResults
compileHsCode sourceCode = withSystemTempFile "test.hs" $ \path handle -> do
  withSystemTempFile "output.o" $ \outputPath outputHandle -> do
    hPutStr handle $ wrapWithMainModule sourceCode
    hClose handle
    (timeTaken, (exitCode, _, _)) <- time $ readProcessWithExitCode "ghc" ["-c", "-o", outputPath, path] ""
    let success = exitCode == ExitSuccess

    size <- if success then hFileSize outputHandle else return 0
    return $ CompileResults {
      timeTaken = timeTaken
      , success = success
      , outputFileSize = size
      }


testConstructorSizes :: [Int] -> IO ()
testConstructorSizes sizes = forM_ sizes $ \size -> do
  info <- compileHsCode $ makeConstructor size
  putStrLn $ "For Size " ++ show size ++ "\t: " ++ show info



-- Up to 10 million
sizesToTest :: [Int]
sizesToTest = take 7 (iterate (*10) 10)

main = testConstructorSizes $ sizesToTest

以下是运行main

的输出
For Size 10     : CompileResults {timeTaken = 0.1390078067779541, success = True, outputFileSize = 1818}
For Size 100    : CompileResults {timeTaken = 0.14700841903686523, success = True, outputFileSize = 2086}
For Size 1000   : CompileResults {timeTaken = 0.1390080451965332, success = True, outputFileSize = 4786}
For Size 10000  : CompileResults {timeTaken = 0.1520085334777832, success = True, outputFileSize = 31786}
For Size 100000 : CompileResults {timeTaken = 0.31201791763305664, success = True, outputFileSize = 301786}
For Size 1000000        : CompileResults {timeTaken = 2.26712965965271, success = True, outputFileSize = 3001786}
For Size 10000000       : CompileResults {timeTaken = 109.2182469367981, success = True, outputFileSize = 30001786}

几点有趣:

  1. 请注意在超过100万之后,所花费的时间会大幅增加。如果变化是线性的,你会期望增加x10,但它是x50的变化。这可能意味着一个1亿字符的构造函数,编译需要大约5000秒(我没有测试)。
  2. 所有条目的文件大小正好是(1786 + (constructorSize * 3)。因此,当在构造函数中使用时,每个char占用三个字节。