Question

默认directory_iterator is implemented using merge sort。我有一个几乎排序数据的巨大列表。我认为使用插入排序会更有益。不幸的是，情况并非如此。

编辑：好的。因此，使用-O2时，我的算法在写入/ dev / null时的速度似乎快了两倍。

Data.List.sort需要18.5秒，耗费大约13GB的内存。而length . mySort [0..1000000]需要1.8秒，使用大约1.2 GB。这是性能的10倍以上。我哪里做错了？这是我的代码

length . sort [0..1000000]

Answer 1

使用ghc -O2使用GHC 8.2.2编译的以下自包含程序在十分之一秒内运行并在堆上分配176M。如果将其提升到[0..10000000]而不是[0..1000000]，它会在一秒钟内运行并在堆上分配1.8G。如果我在:set +s之后运行GHCi下的大版本（1000万），那么我大概复制你的结果：12.1秒和12.9Gig。如果您正在GHCi下进行计时测试，请不要这样做！ GHCi编译为未经优化的解释字节代码。

import Control.Monad
import qualified Data.Vector as V
import qualified Data.Vector.Mutable as Mv

mvInsertionSort :: Ord a => Mv.IOVector a -> IO (Mv.IOVector a)
mvInsertionSort mv = do
    forM_ [1 .. Mv.length mv - 1] $ \x -> do
        pivot <- Mv.read mv x
        mvInsertionSort' [x-1, x-2 .. 0] mv pivot
    return mv

-- insertion Sort helper
mvInsertionSort' :: Ord a => [Int] -> Mv.IOVector a -> a -> IO ()
mvInsertionSort' (y:ys) mv pivot = do
    currElem <- Mv.read mv y
    if pivot < currElem
        then do 
                Mv.write mv y pivot 
                Mv.write mv (y+1) currElem 
                mvInsertionSort' ys mv pivot
        else Mv.write mv (y+1) pivot

mvInsertionSort' [] _ _ = return ()

main = do
  let v = V.fromList [0..1000000]  -- one million
  v' <- V.freeze =<< mvInsertionSort =<< V.thaw v
  print $ V.length v'

Answer 2

因为merge/quick排序的复杂性为O(n * log(n))而非O(n*n)排序的insertion。

为什么我的插入排序比（几乎）排序数据上的库合并排序慢？

2 个答案: