在类型系统中建模串行格式,如Servant

时间:2016-11-18 00:53:22

标签: haskell serialization type-level-computation servant

我正在进行API集成,忽略XML或JSON的存在,而只是附加字符数据。 (Metro2格式,如果感兴趣的话)

我正在简化,但想象一个人需要像这样序列化:

  • 在pos 0,4个字符:消息中的字节数
  • 在pos 5:6 chars:"PERSON"硬编码
  • 位置11:20个字符:名字,左对齐和空格填充
  • 在pos 21:8 chars:生日,YYYYMMDD
  • 在pos 29:3 chars:年龄,右对齐和零填充

数字字段始终是右对齐和零填充的。文本字段始终左对齐并填充空格。

例如:

"0032PERSONDAVID WILCOX        19820711035"

我可以在类型系统中表达这个吗?像servant那样做什么?像这样的东西?

newtype ByteLength = ByteLength Int
newtype Age = Age Int
-- etc

type PersonMessage
     = Field ByteLength '0
    :| Field "PERSON" '5
    :| Field Name '11
    :| Field Date '21
    :| Field Age '29

-- :| is a theoretical type operator, like :> in servant
-- the number is the expected offset
-- the length of the field is implicit in the type

我可以静态检查我的序列化实现是否与类型匹配?

我可以静态检查第3个字段(Name)的偏移量是否为11?前面字段的长度加起来是11?我假设没有,因为这似乎需要完全依赖类型支持。

这是在正确的轨道上吗?

instance ToMetro Age where
   -- get the length into the type system using a type family?
   field = Numeric '3

   -- express how this is encoded. Would need to use the length from the type family. Or if that doesn't work, put it in the constructor.
   toMetro age = Numeric age

更新:我想静态验证的函数示例:

personToMetro :: Person -> PersonMessage
personToMetro p = error "Make sure that what I return is a PersonMessage"

1 个答案:

答案 0 :(得分:3)

只是为了给你一些灵感,只需要做仆人所做的事情,并为你支持的不同组合器提供不同的类型:

[batch_key, company_id, users_info]

是的,所以这很简陋,但它已经允许你做了

{-# LANGUAGE GADTs, DataKinds, KindSignatures, TypeOperators, ScopedTypeVariables #-}

module Seriavant where

import GHC.TypeLits
import Data.Proxy
import Data.List (stripPrefix)

data Skip (n :: Nat) = Skip deriving Show
data Token (n :: Nat) = Token String deriving Show
data Lit (s :: Symbol) = Lit deriving Show

data (:>>) a b = a :>> b deriving Show
infixr :>>

class Deserialize a where
    deserialize :: String -> Maybe (a, String)

instance (KnownNat n) => Deserialize (Skip n) where
    deserialize s = do
        (_, s') <- trySplit (natVal (Proxy :: Proxy n)) s
        return (Skip, s')

instance (KnownNat n) => Deserialize (Token n) where
    deserialize s = do
        (t, s') <- trySplit (natVal (Proxy :: Proxy n)) s
        return (Token t, s')

instance (KnownSymbol lit) => Deserialize (Lit lit) where
    deserialize s = do
        s' <- stripPrefix (symbolVal (Proxy :: Proxy lit)) s
        return (Lit, s')

instance (Deserialize a, Deserialize b) => Deserialize (a :>> b) where
    deserialize s = do
        (x, s') <- deserialize s
        (y, s'') <- deserialize s'
        return (x :>> y, s'')

trySplit :: Integer -> [a] -> Maybe ([a], [a])
trySplit 0 xs = return ([], xs)
trySplit n (x:xs) = do
    (xs', ys) <- trySplit (n-1) xs
    return (x:xs', ys)
trySplit _ _ = Nothing

的工作原理如下:

type MyFormat = Token 4 :>> Lit "PERSON" :>> Skip 1 :>> Token 4

testDeserialize :: String -> Maybe MyFormat
testDeserialize = fmap fst . deserialize

编辑:原来我完全误解了这个问题,Sean要求序列化,而不是反序列化...但当然我们也可以这样做:

*Seriavant> testDeserialize "1"
Nothing
*Seriavant> testDeserialize "1234PERSON Foo "
Just (Token "1234" :>> (Lit :>> (Skip :>> Token "Foo ")))

(当然这对所有class Serialize a where serialize :: a -> String instance (KnownNat n) => Serialize (Skip n) where serialize Skip = replicate (fromIntegral $ natVal (Proxy :: Proxy n)) ' ' instance (KnownNat n) => Serialize (Token n) where serialize (Token t) = pad (fromIntegral $ natVal (Proxy :: Proxy n)) ' ' t instance (KnownSymbol lit) => Serialize (Lit lit) where serialize Lit = symbolVal (Proxy :: Proxy lit) instance (Serialize a, Serialize b) => Serialize (a :>> b) where serialize (x :>> y) = serialize x ++ serialize y pad :: Int -> a -> [a] -> [a] pad 0 _x0 xs = xs pad n x0 (x:xs) = x : pad (n-1) x0 xs pad n x0 [] = replicate n x0 连接等都有可怕的表现,但这不是重点。

String

当然,如果我们知道格式,我们可以避免那些讨厌的类型注释:

*Seriavant> serialize ((Token "1234" :: Token 4) :>> (Lit :: Lit "FOO") :>> (Skip :: Skip 2) :>> (Token "Bar" :: Token 10))
"1234FOO  Bar       "
type MyFormat = Token 4 :>> Lit "PERSON" :>> Skip 1 :>> Token 4

testSerialize :: MyFormat -> String
testSerialize = serialize