Intersect vs Data.Set Intersection Runtime

时间:2017-01-12 17:47:45

标签: haskell set intersection

基于stackExchange上的另一个线程,我尝试使用Data.Set.intersection而不是Prelude在一个简短的整数列表(每个<5个元素)上相交来改善重复交集的运行时间。在GHCI控制台中尝试一个简单的例子,我看到了显着的性能差异: intersect [1..10000] [5000..200000] (16.04 secs, 21919236 bytes)

S.intersection (S.fromList [1..10000]) (S.fromList [5000..200000])
(0.25 secs, 125337020 bytes)

我希望修改我的算法并体验类似的奇迹性能提升与小整数列表相交:

--Check takes a list of integers and then checks if another integer n is
--"swappable"  Swappable integers are two integers that, when concatenated, are prime. Any new sets created with n are returned
check ::[[Integer]] -> Integer -> [[Integer]]
check [] n = [[n]]
check (x:xs) n
         |nonswappable == [] = (x++[n]):(check xs (n))
         |otherwise  = (check (removecandidates xs) n)
         where
             --nonswappable returns null if the input n isswappable with everything
             --isswappable:: Integer->Integer->Bool This function concatenates two numbers and checks if they are prime.
             nonswappable = snd(partition (isswappable n) x)
             --removecandidates strikes out candidate lists in the list containing a nonswappable number in the set just checked
             removecandidates list = filter ((==[]).(intersect nonswappable))  list

相关的个人资料:

    total time  =        5.30 secs   (5304 ticks @ 1000 us, 1 processor)
    total alloc = 2,676,617,968 bytes  (excludes profiling overheads)

COST CENTRE             MODULE  %time %alloc

powm                    Main     29.3   36.2
isswappable             Main     26.3   25.3
check.removecandidates  Main     15.9   18.5
probmrisPrime.randomAs  Main      6.9    3.8



                                                                               individual     inherited
COST CENTRE                       MODULE                     no.     entries  %time %alloc   %time %alloc

  primes                          Main                       110           1    0.0    0.0     0.2    0.5
   sieve                          Main                       111         573    0.2    0.5     0.2    0.5
  findsets                        Main                       104           1    0.0    0.0    99.8   99.5
   sets                           Main                       105         571    0.2    0.2    99.8   99.5
    check                         Main                       106       55704    0.2    0.1    99.6   99.3
     check.removecandidates       Main                       126       52920   15.9   18.5    15.9   18.5
     check.nonswappable           Main                       107       55135    0.4    0.2    83.5   80.7

使用Data.Set改变我的预期与上面类似:

--Check takes a list of integers and then checks if another integer n is
--"swappable"  Swappable integers are two integers that, when concatenated, are prime. Any new sets created with n are returned
    check ::[[Integer]] -> Integer -> [[Integer]]

check [] n = [[n]]
check (x:xs) n
         |nonswappable == [] = (x++[n]):(check xs (n))
         |otherwise  = (check (removecandidates xs) n)
         where
               --nonswappable returns null if the input n is swappable with everything
               --isswappable:: Integer->Integer->Bool This function concatenates two numbers and checks if they are prime.
              nonswappable = snd(partition (isswappable n) x)
               --removecandidates strikes out candate lists in the list containing a nonswappable number in the set just checked
              removecandidates list = filter ((==[]).(S.toList.S.intersection setnonswappable.setx))  list
                  where
                     setnonswappable = S.fromList nonswappable
                     setx lst = S.fromList lst

相关的探查器详细信息:

total time  =        7.36 secs   (7361 ticks @ 1000 us, 1 processor)
total alloc = 3,019,801,528 bytes  (excludes profiling overheads)

COST CENTRE                 MODULE  %time %alloc

check.removecandidates      Main     30.7   21.8
powm                        Main     21.6   32.1
isswappable                 Main     20.6   22.4
check.removecandidates.setx Main      6.8    5.9
probmrisPrime.randomAs      Main      4.9    3.4



        individual     inherited
    COST CENTRE                                  MODULE                     no.     entries  %time %alloc   %time %alloc

    MAIN                                         MAIN                                     63           0    0.0    0.0     
      primes                                     Main                       112           1    0.0    0.0     0.3    0.4
       sieve                                     Main                       113         573    0.3    0.4     0.3    0.4
      findsets                                   Main                       106           1    0.0    0.0    99.7   99.6
       sets                                      Main                       107         571    0.3    0.2    99.7   99.6
        check                                    Main                       108       55704    0.2    0.1    99.4   99.4
         check.removecandidates                  Main                       128       52920   30.7   21.8    37.5   27.8
          check.removecandidates.setx            Main                       130     4949177    6.8    5.9     6.8    5.9
          check.removecandidates.setnonswappable Main                       129       52446    0.0    0.0     0.0    0.0

性能下降到调用Data.Set.intersection的函数(check.removecandidates)上升到最大成本中心的程度!您可以提供哪些建议来查明严重的性能问题。此外,是否有一种完全不同的方法可以更有效地测试许多小整数列表的交集。

根据评论,还添加了具有所有相关功能的完全可执行代码。

module Main where

import Data.List
import System.Random
import qualified Data.Set as S

primes::[Integer]
primes = sieve (2:[3,5..])
sieve []=[]
sieve (x:xs) = x:sieve[y|y<-xs, (y `rem` x)/=0]


--isswappabble using probablistic prime
isswappable:: Integer->Integer->Bool
isswappable x y =  (probmrisPrime 3 (read ((show x) ++ (show y))) == "prob prime") &&
                        ((probmrisPrime 3 (read ((show y) ++ (show x))) == "prob prime"))



--miller rabine probablistic prime test
--Input: n > 3, an odd integer to be tested for primality;
--Input: k, a parameter that determines the accuracy of the test
--random seed for miller-rabine test
mygen = mkStdGen 32123432141312332130

probmrisPrime k n
    | n `mod` 2 == 0 = "composite"
    |otherwise = witnessloop n randomAs
    where randomAs = take k (randomRs (2, (n-2)) (mygen))::[Integer]

witnessloop _ [] = "prob prime"
witnessloop n (hd:lst)
    | x == 1 || x==n-1 = witnessloop n (lst)
    | otherwise = checks (s-1) x
    where x = powm hd d n 1
          d = millerrabined n
          s = millerrabines n
          checks 0 _ = "composite"
          checks s x
            | newx == 1 = "composite"
            | newx == n-1 = witnessloop n (lst)
            | otherwise = checks (s-1) newx
            where newx = (x^2) `mod` n

millerrabines num=factor2 (num-1)
millerrabined num=quot (num-1) (2^(millerrabines num))

factor2 num
    | num `rem` 2 ==0 = 1 + factor2 (quot num 2)
    | otherwise = 0

powm :: Integer -> Integer -> Integer -> Integer -> Integer
powm b 0 m r = r
powm b e m r | e `mod` 2 == 1 = powm (b * b `mod` m) (e `div` 2) m (r * b `mod` m)
powm b e m r = powm (b * b `mod` m) (e `div` 2) m r




sets :: Int->[[Integer]]
--sets 1 = [[3]]
--checks all the previous sets with the next prime number and appends any new sets with n to the list
sets 2 = [[3]]
--start with prime == 7 since any prime with 5 will always divide by 5
sets n = check (take (n-2) findsets) (primes!!n)
check [] n = [[n]]
check (x:xs) n
         |nonswappable == [] = (x++[n]):(check xs (n))
         |otherwise  = (check (removecandidates xs) n)
         where
         --returns null if the input n is swappable with everything
               nonswappable = snd(partition (isswappable n) x)
               removecandidates list = [x| x<-list, intersect nonswappable x == [] ]
         {-replacement logic using Data.Set
              removecandidates list = filter ((==[]).(intersect nonswappable))  list
              removecandidates list = filter ((==[]).(S.toList.S.intersection setnonswappable.setx))  list
                  where
                     setnonswappable = S.fromList nonswappable--                     
                     setx lst = S.fromList lst -}

--generate all the sets
findsets= concat(map sets [2..])

--find sets of certain length
findGroups list len = [x| x<-list, length x >= len]

main = do
      putStrLn $ show ( take 1 (findGroups (findsets) 5))

0 个答案:

没有答案