优化此位导管代码以提高速度

时间:2015-12-08 19:52:06

标签: haskell optimization

我正在实施一个近似计数算法:

使用log(log <维护t计数器{X 1 ,...,X t }每个

的em> n )位
  • 将所有计数器初始化为0

  • 当一个项目到达时,将每个X i 独立地增加1,概率为(1/2) X i

  • 当流结束时,输出Z = 1 / t(2 X 1 - 1 + ... + 2 X t - 1)

  • 独立并行地重复上述步骤m次并输出中位数。

这是在haskell中,使用管道库。

import Data.Random
import Data.Conduit
import Data.List
import Data.Ord (comparing)
import qualified Data.Conduit.List as Cl

import Control.Monad.Identity


type Prob    = Double
type Counter = Float
type Delta   = Double
type Eps     = Double


-- * Run Morris alpha on stream inputs `xs`
morrisA :: [a] -> IO Counter
morrisA xs = flip runRVar StdRandom $ Cl.sourceList xs $$ alpha

-- * Run Morris beta on stream inputs `xs` for `t` independent trials and average
morrisB :: Int -> [a] -> IO Counter
morrisB t =  fmap rmean . replicateM t . morrisA

-- * final morris algorithm
-- * Run on stream inputs `xs` for t independent trials for `t = 1/eps`, 
-- * and `m` times in parralell, for `m = 1/(e^2 * d)`
-- * and take the median

morris :: Eps -> Delta -> [a] -> IO Counter
morris e d = fmap rmedian . replicateM m . morrisB t 
  where (t,m) = (round $ 1/(e^2*d), round $ 1/d)

-- * Utils * -- 

-- * A step in morris Algorithm alpha
alpha :: Sink a RVar Counter
alpha = (\x -> 2^(round x) - 1) <$> Cl.foldM (\x _ -> incr x) 0

-- * Increment a counter `x` with probability 1/2^x
incr :: Counter -> RVar Counter
incr x = do
  h <- (\q -> q <= (0.5^(round x) :: Prob)) <$> uniform 0 1
  return $ if h then (seq () succ x) else seq () x

rmean, rmedian :: (Floating a, Ord a, RealFrac a) => [a] -> Float
rmean   = fromIntegral . round . mean
rmedian = fromIntegral . round . median

-- |Numerically stable mean
mean :: Floating a => [a] -> a
mean x = fst $ foldl' (\(!m, !n) x -> (m+(x-m)/(n+1),n+1)) (0,0) x

-- |Median
median :: (Floating a, Ord a) => [a] -> a
median x | odd n  = head  $ drop (n `div` 2) x'
         | even n = mean $ take 2 $ drop i x'
                  where i = (length x' `div` 2) - 1
                        x' = sort x
                        n  = length x

问题是morris的运行时在流的长度和迭代次数t*m中都是线性的。因此,例如morrisA对于100个项目大约需要100μs。现在,如果我们希望对5%的错误有95%的信心,我们必须morrisn=160000次。

有人可以建议如何优化此代码吗?也许:

  1. 在导管以外的其他方面实施morris

  2. 重复除morrisA以外的replicateM更快捷的方法。

0 个答案:

没有答案