Question

我正在尝试构建一个包含无限位图的惰性数据结构。我想支持以下操作：

true :: InfBitMap

返回True的无限位图，即所有位置的值应为True。
falsify :: InfBitMap -> [Int] -> InfBitMap

将列表中的所有位置设置为False。该列表可能是无限的。例如，伪造真[0,2 ..]将返回一个列表，其中所有（且仅）奇数位置为真。
check :: InfBitMap -> Int -> Bool

检查索引的值。

到目前为止，这是我能做的。

-- InfBitMap will look like [(@), (@, @), (@, @, @, @)..]
type InfBitMap = [Seq Bool]

true :: InfBitMap
true = iterate (\x -> x >< x) $ singleton True

-- O(L * log N) where N is the biggest index in the list checked for later
-- and L is the length of the index list. It is assumed that the list is
-- sorted and unique.
falsify :: InfBitMap -> [Int] -> InfBitMap
falsify ls is = map (falsify' is) ls
     where
         -- Update each sequence with all indices within its length
         -- Basically composes a list of (update pos False) for all positions
         -- within the length of the sequence and then applies it.
         falsify' is l = foldl' (.) id
                                (map ((flip update) False)
                                     (takeWhile (< length l) is))
                         $ l
-- O(log N) where N is the index.
check :: InfBitMap -> Int -> Bool
check ls i = index (fromJust $ find ((> i) . length) ls) i

我想知道是否有一些我缺少的Haskellish概念/数据结构会使我的代码更优雅/更高效（常量对我来说无关紧要，只是顺序）。我试着看拉链和镜片，但它们似乎没有帮助。我想保持更新和检查的复杂性对数（可能只是摊销对数）。

注意：在有人怀疑之前，这不是一个家庭作业问题！

更新：

我刚刚想到支票可以改进为：

-- O(log N) where N is the index.
-- Returns "collapsed" bitmap for later more efficient checks.
check :: InfBitMap -> Int -> (Bool, InfBitMap)
check ls i = (index l i, ls')
    where
        ls'@(l:_) = dropWhile ((<= i) . length) ls

可以将其转换为Monad以实现代码清洁。

Answer 1

众所周知的integer trie略有变化似乎适用于此。

{-# LANGUAGE DeriveFunctor #-}

data Trie a = Trie a (Trie a) (Trie a) deriving (Functor)

true :: Trie Bool
true = Trie True true true

-- O(log(index))
check :: Trie a -> Int -> a
check t i | i < 0 = error "negative index"
check t i = go t (i + 1) where
    go (Trie a _ _) 1 = a
    go (Trie _ l r) i = go (if even i then l else r) (div i 2)

--O(log(index))
modify :: Trie a -> Int -> (a -> a) -> Trie a
modify t i f | i < 0 = error "negative index"
modify t i f = go t (i + 1) where
    go (Trie a l r) 1 = Trie (f a) l r
    go (Trie a l r) i | even i = Trie a (go l (div i 2)) r
    go (Trie a l r) i = Trie a l (go r (div i 2))

不幸的是我们不能使用modify来实现falsify因为我们无法处理无限的索引列表（所有修改都必须在可以检查trie的元素之前执行）。相反，我们应该做更像合并的事情：

ascIndexModify :: Trie a -> [(Int, a -> a)] -> Trie a
ascIndexModify t is = go 1 t is where
    go _ t [] = t
    go i t@(Trie a l r) ((i', f):is) = case compare i (i' + 1) of
        LT -> Trie a (go (2*i) l ((i', f):is)) (go (2*i+1) r ((i', f):is))
        GT -> go i t is
        EQ -> Trie (f a) (go (2*i) l is) (go (2*i+1) r is)

falsify :: Trie Bool -> [Int] -> Trie Bool
falsify t is = ascIndexModify t [(i, const False) | i <- is]

我们假设is中的严格升序索引，否则我们会跳过trie中的位置甚至是非终止位置，例如在check (falsify t (repeat 0)) 1中。

时间的复杂性因懒惰而有点复杂。在check (falsify t is) index中，我们需要支付额外log 2 index次比较的额外费用，以及进一步length (filter (<index) is)次比较（即跨越所有指数的成本小于我们的比例）仰视）。你可以说它是O(max(log(index), length(filter (<index) is))。无论如何，它肯定比使用O(length is * log (index))为falsify实现的is更好modify。

我们必须记住，在第一个check没有为check支付任何额外费用之后，树节点将被评估一次，而后续falsify - s用于相同的索引。再一次，懒惰使这有点复杂。

当我们想要遍历一个trie的前缀时，这个falsify也表现得非常好。拿这个toList函数：

trieToList :: Trie a -> [a]
trieToList t = go [t] where
    go ts = [a | Trie a _ _ <- ts] 
            ++ go (do {Trie _ l r <- ts; [l, r]})

这是标准的广度优先遍历，在线性时间内。我们计算take n $ trieToList (falsify t is)时，遍历时间保持线性，因为falsify最多会产生n + length (filter (<n) is)次额外比较，最多2 * n，假设严格增加is。

（旁注：广度优先遍历的空间要求相当痛苦，但我看不到一种简单的方法来帮助它，因为迭代深化在这里更糟糕，因为整个树必须保存在内存中，而bfs只需要记住树的底层）。

Answer 2

表示这一点的一种方法是作为一种功能。

true = const True

falsify ls is = \i -> not (i `elem` is) && ls i

check ls i = ls i

真实和伪造的功能非常有效。检查功能可以和线性一样糟糕。可以提高相同基本思想的效率。我喜欢它的优雅。

无限懒惰的位图

2 个答案: