我正在实施一个近似计数算法:
使用log(log <维护t计数器{X 1 ,...,X t }每个
的em> n )位将所有计数器初始化为0
当一个项目到达时,将每个X i 独立地增加1,概率为(1/2) X i 子>
当流结束时,输出Z = 1 / t(2 X 1 - 1 + ... + 2 X t - 1)
独立并行地重复上述步骤m次并输出中位数。
这是在haskell中,使用管道库。
import Data.Random
import Data.Conduit
import Data.List
import Data.Ord (comparing)
import qualified Data.Conduit.List as Cl
import Control.Monad.Identity
type Prob = Double
type Counter = Float
type Delta = Double
type Eps = Double
-- * Run Morris alpha on stream inputs `xs`
morrisA :: [a] -> IO Counter
morrisA xs = flip runRVar StdRandom $ Cl.sourceList xs $$ alpha
-- * Run Morris beta on stream inputs `xs` for `t` independent trials and average
morrisB :: Int -> [a] -> IO Counter
morrisB t = fmap rmean . replicateM t . morrisA
-- * final morris algorithm
-- * Run on stream inputs `xs` for t independent trials for `t = 1/eps`,
-- * and `m` times in parralell, for `m = 1/(e^2 * d)`
-- * and take the median
morris :: Eps -> Delta -> [a] -> IO Counter
morris e d = fmap rmedian . replicateM m . morrisB t
where (t,m) = (round $ 1/(e^2*d), round $ 1/d)
-- * Utils * --
-- * A step in morris Algorithm alpha
alpha :: Sink a RVar Counter
alpha = (\x -> 2^(round x) - 1) <$> Cl.foldM (\x _ -> incr x) 0
-- * Increment a counter `x` with probability 1/2^x
incr :: Counter -> RVar Counter
incr x = do
h <- (\q -> q <= (0.5^(round x) :: Prob)) <$> uniform 0 1
return $ if h then (seq () succ x) else seq () x
rmean, rmedian :: (Floating a, Ord a, RealFrac a) => [a] -> Float
rmean = fromIntegral . round . mean
rmedian = fromIntegral . round . median
-- |Numerically stable mean
mean :: Floating a => [a] -> a
mean x = fst $ foldl' (\(!m, !n) x -> (m+(x-m)/(n+1),n+1)) (0,0) x
-- |Median
median :: (Floating a, Ord a) => [a] -> a
median x | odd n = head $ drop (n `div` 2) x'
| even n = mean $ take 2 $ drop i x'
where i = (length x' `div` 2) - 1
x' = sort x
n = length x
问题是morris
的运行时在流的长度和迭代次数t*m
中都是线性的。因此,例如morrisA
对于100个项目大约需要100μs。现在,如果我们希望对5%的错误有95%的信心,我们必须morris
次n=160000
次。
有人可以建议如何优化此代码吗?也许:
在导管以外的其他方面实施morris
重复除morrisA
以外的replicateM
更快捷的方法。