Question

我在编写一个简单的函数时遇到了问题而没有太多重复自己，下面是一个简化的例子。我试图编写的真正程序是来自python的BI服务器的内存数据库的端口。实际上有更多不同的类型（大约8个）和更多的逻辑，大多数可以表达为在多态类型上运行的函数，比如Vector a，但是仍然有些逻辑必须处理不同类型的值。

由于效率原因，分别包装每个值（使用[（Int，WrappedValue）]类型）不是一个选项 - 在实际代码中我使用的是未装箱的矢量。

type Vector a = [(Int, a)] -- always sorted by fst

data WrappedVector = -- in fact there are 8 of them
      FloatVector (Vector Float)
    | IntVector (Vector Int)
    deriving (Eq, Show)

query :: [WrappedVector] -> [WrappedVector] -- equal length
query vectors = map (filterIndexW commonIndices) vectors
    where
        commonIndices = intersection [mapFstW vector | vector <- vectors]

intersection :: [[Int]] -> [Int]
intersection = head -- dummy impl. (intersection of sorted vectors)

filterIndex :: Eq a => [Int] -> Vector a -> Vector a
filterIndex indices vector = -- sample inefficient implementation
    filter (\(idx, _) -> idx `elem` indices) vector

mapFst :: Vector a -> [Int]
mapFst = map fst

-- idealy I whould stop here, but I must write repeat for all possible types
-- and kinds of wrapped containers and function this:

filterIndexW :: [Int] -> WrappedVector -> WrappedVector
filterIndexW indices vw = case vw of
    FloatVector v -> FloatVector $ filterIndex indices v
    IntVector   v -> IntVector $ filterIndex indices v

mapFstW :: WrappedVector -> [Int]
mapFstW vw = case vw of
    FloatVector v -> map fst v
    IntVector   v -> map fst v

-- sample usage of query
main = putStrLn $ show $ query [FloatVector [(1, 12), (2, -2)],
                                IntVector   [(2, 17), (3, -10)]]

如何在没有像mapFstW和filterIndexW函数那样包装和解包的情况下表达这样的代码？

Answer 1

如果您愿意使用一些编译器扩展，ExistentialQuantification可以很好地解决您的问题。

{-# LANGUAGE ExistentialQuantification #-}
{-# LANGUAGE StandaloneDeriving #-}
module VectorTest where

type PrimVector a = [(Int, a)]

data Vector = forall a . Show a => Vector (PrimVector a)

deriving instance Show Vector

query :: [Vector] -> [Vector] -- equal length
query vectors = map (filterIndex commonIndices) vectors
    where
        commonIndices = intersection [mapFst vector | vector <- vectors]

intersection :: [[Int]] -> [Int]
intersection = head -- dummy impl. (intersection of sorted vectors)

filterIndex :: [Int] -> Vector -> Vector
filterIndex indices (Vector vector) = -- sample inefficient implementation
    Vector $ filter (\(idx, _) -> idx `elem` indices) vector

mapFst :: Vector -> [Int]
mapFst (Vector l) = map fst l

-- sample usage of query
main = putStrLn $ show $ query [Vector [(1, 12), (2, -2)],
                                Vector [(2, 17), (3, -10)]]

如果您为Vector编写手动显示实例，例如

，则可以删除StandaloneDeriving要求

instance Show Vector where
    show (Vector v) = show v

Answer 2

在没有性能影响的情况下包装单个类型的标准选项是

{-# LANGUAGE GeneralizedNewtypeDeriving #-} -- so we can derive Num
newtype MyInt = My Int deriving (Eq,Ord,Show,Num)
newtype AType a = An a deriving (Show, Eq)

因为它只在类型级别创建差异 - 数据表示是相同的，因为它们都被编译掉了。你甚至可以指定值是未装箱的，但是......这对你没有帮助，因为你包装了多种类型。

真正的问题是你试图用静态类型语言表示动态类型的解决方案。动态类型必然会有性能损失，这在动态语言中是隐藏的，但在标记中明确指出。

您有两种解决方案：

接受动态类型涉及对静态类型的额外运行时检查，并与丑陋一起生活。
拒绝动态类型的需要，接受多态类型整理所有代码并将类型检查移动到编译时和数据采集。

我觉得2是迄今为止最好的解决方案，你应该放弃尝试列出你想要使用的所有类型的程序，而不是编程使用任何类型。它整洁，清晰，高效。你检查有效性并处理一次，然后再担心。

基本类型有限域容器的动态类型

2 个答案: