我有Vector
个排序值,例如
fromList [1, 2, 4, 5]
现在我想插入另一个值,让我们说3
并创建一个新的向量。在命令式语言中,我分配一个大小为5的数组,循环遍历原始向量,复制旧值,并在适当的位置插入新值,以便我获得
fromList [1, 2, 3, 4, 5]
我可以使用 vector API
来实现let (pre, post) = span (< x) n
in V.concat [pre, pure x, post]
可以工作,但遍历原始矢量两次:一次搜索分割时和一次组合时。有没有办法如何在一次通过中做到这一点? (另一个解决方案是使用二分搜索来搜索分裂点,但我感兴趣的是,它是否可以创建真正的单遍解决方案。)
答案 0 :(得分:3)
似乎可用的最佳工具是unfoldr
,例如:
import qualified Data.Vector as V
import Data.Vector (Vector)
insertElem :: Int -> Vector Int -> Vector Int
insertElem e v = V.unfoldrN (len+1) go (0,False)
where
len = V.length v
go (i,found)
| i >= len = if found then Nothing else Just (e, (i+1, True))
| found = Just (x, (i+1, True))
| x <= e = Just (x, (i+1, False))
| otherwise = Just (e, (i, True))
where x = v V.! i
test1 = insertElem 3 (V.fromList [1,2,4,5])
test2 = insertElem 0 (V.fromList [1,2,4,5])
test3 = insertElem 6 (V.fromList [1,2,4,5])
我没有非常努力地去除go
函数中的逻辑。
答案 1 :(得分:3)
user5402's answer是一种非常好的方式,但它会成为Data.Vector
documentation中描述的效率问题的牺牲品。具体来说,一旦找到插入点并且盲目地复制,它就不再强制实际从源向量中读取值。相反,它用thunks填充目标向量,当被强制时,从源向量读取。
注意:这是我提出的第一个解决方案。它很容易理解,但它在vector
中的流融合系统中效果不佳,这可能导致相对较差的性能。以下解决方案总的来说更好。
如文档中所述,一种解决方案是使用monadic indexM
操作来执行这些盲读。这会强制执行读取,但不会强制读取值。因此,它将指针(可能是指向thunk的指针)从源向量复制到目标向量。为了获得最高效率,下面的所有内容都应替换为unsafe
变体(特别是unsafeIndexM
,unsafeIndex
和unsafeWrite
。
{-# Language ScopedTypeVariables #-}
module Insert where
import qualified Data.Vector as V
import Data.Vector (Vector)
import qualified Data.Vector.Mutable as MV
import Data.Vector.Mutable (MVector)
import Control.Monad.ST
insertElem :: forall a . Ord a => a -> Vector a -> Vector a
insertElem e v = V.create act
where
act :: forall s . ST s (MVector s a)
act = do
mv <- MV.new (V.length v + 1)
let
start :: Int -> ST s (MVector s a)
start i
| i == V.length v ||
e <= v V.! i = MV.write mv i e >> finish i
| otherwise = MV.write mv i (v V.! i) >> start (i+1)
finish :: Int -> ST s (MVector s a)
finish i
| i == V.length v = return mv
| otherwise = do
V.indexM v i >>= MV.write mv (i+1)
finish (i+1)
in start 0
insertElemInt :: Int -> Vector Int -> Vector Int
insertElemInt = insertElem
请注意,实际上不需要命名act
操作并使用ScopedTypeVariables
,但我发现它们在追踪我的错误方面非常有用。
上面的代码在流融合方面不会很好用,因为索引遍布整个地方。以下方法应该正确融合,并且仍然避免产生不必要的thunk。这是我第一次触及流融合代码,因此有些东西可能会有所改进。这个第一个基于流的版本的唯一问题是,如果它 融合,那么输入流的步长函数将在其中一个元素上运行两次。这通常不是问题,但如果步进功能非常昂贵(罕见),它可能是。因此,我提供了一个在这种情况下应该更好的替代方案。我不确定在实践中哪些会更好,所以我包括两者。
使用这些基于流的解决方案之一,代码
testEnum :: Word -> Word -> Word -> Word
testEnum ins low high = V.product $
insertElem ins $ V.enumFromStepN low 1 (fromIntegral high)
将编译成在恒定空间中运行的循环,从不实际创建向量。
{-# Language ScopedTypeVariables #-}
module Insert where
import Data.Vector (Vector)
import Data.Word (Word)
import qualified Data.Vector.Fusion.Stream.Monadic as S
import qualified Data.Vector.Generic as G
import Data.Vector.Fusion.Util (Id(..))
-- To check on unboxing and such
insertElemWord :: Word -> Vector Word -> Vector Word
insertElemWord = insertElem
{-# INLINE insertElem #-}
insertElem :: forall a . Ord a => a -> Vector a -> Vector a
insertElem a v = G.unstream (insertElemS a (G.stream v))
{-# INLINE [1] insertElemS #-}
insertElemS :: forall a . Ord a => a -> S.Stream Id a -> S.Stream Id a
insertElemS e (S.Stream step (state::s) size) = S.Stream step' (state, False) (size + 1)
where
{-# INLINE [0] step' #-}
step' :: (s, Bool) -> Id (S.Step (s, Bool) a)
step' (s, True) = Id $ case unId (step s) of
S.Yield a s' -> S.Yield a (s', True)
S.Skip s' -> S.Skip (s', True)
S.Done -> S.Done
step' (s, False) = Id $ case unId (step s) of
S.Yield a s' ->
if e <= a
then S.Yield e (s, True)
else S.Yield a (s', False)
S.Skip s' -> S.Skip (s', False)
S.Done -> S.Yield e (s, True)
{-# Language ScopedTypeVariables #-}
module Insert where
import Data.Vector (Vector)
import Data.Word (Word)
import qualified Data.Vector.Fusion.Stream.Monadic as S
import qualified Data.Vector.Generic as G
import Data.Vector.Fusion.Util (Id(..))
data Status s a = Pre s | During s a | Post s | End
insertElemWord :: Word -> Vector Word -> Vector Word
insertElemWord = insertElem
{-# INLINE insertElem #-}
insertElem :: forall a . Ord a => a -> Vector a -> Vector a
insertElem a v = G.unstream (insertElemS a (G.stream v))
{-# INLINE [1] insertElemS #-}
insertElemS :: forall a . Ord a => a -> S.Stream Id a -> S.Stream Id a
insertElemS e (S.Stream step (state::s) size) = S.Stream step' (Pre state) (size+1)
where
{-# INLINE [0] step' #-}
step' :: Status s a -> Id (S.Step (Status s a) a)
step' (Post s) = Id $ case unId (step s) of
S.Yield a s' -> S.Yield a (Post s')
S.Skip s' -> S.Skip (Post s')
S.Done -> S.Done
step' (Pre s) = Id $ case unId (step s) of
S.Yield a s'
| e <= a -> S.Yield e (During s' a)
| otherwise -> S.Yield a (Pre s')
S.Skip s' -> S.Skip (Pre s')
S.Done -> S.Yield e End
step' (During s a) = Id (S.Yield a (Post s))
step' End = Id S.Done