Haskell找到两个最近点之间的距离

时间:2012-11-17 05:56:15

标签: algorithm haskell functional-programming hugs closest-points

给定二维空间中的点列表,您想要执行一个函数 Haskell找到两个最近点之间的距离。 例: 输入:项目[(1,5),(3,4),(2,8),( - 1,2),( - 8.6),(7.0),(1.5),(5.5),(4.8), (7.4)] 输出:2.0

假设列表中两个最远点之间的距离最多为10000。

这是我的代码:

import Data.List
import System.Random

sort_ :: Ord a => [a] -> [a]
sort_ []    =  []
sort_ [x]  =  [x]
sort_ xs   =  merge (sort_ left) (sort_ right)
  where
    (left, right) = splitAt (length xs `div` 2) xs
    merge [] xs = xs
    merge xs [] = xs
    merge (x:xs) (y:ys)=
    if x <= y then 
        x : merge xs (y:ys)
    else  y : merge (x:xs) ys     

project :: [(Float,Float)] -> Float
project [] = 0
project (x:xs)=
    if null (xs) then 
        error "The list have only 1 point"
    else head(sort_(dstList(x:xs)))

distance :: (Float,Float)->(Float,Float) -> Float
distance (x1,y1) (x2,y2) = sqrt((x1 - x2)^2 + (y1 - y2)^2)


dstList :: [(Float,Float)] -> [Float]
dstList (x:xs)=
    if length xs == 1 then 
        (dstBetween x xs):[]
    else (dstBetween x xs):(dstList xs)


dstBetween :: (Float,Float) -> [(Float,Float)] -> Float
dstBetween pnt (x:xs)=
    if null (xs) then 
        distance pnt x
    else  minimum ((distance pnt ):((dstBetween pnt xs)):[])

{-
Calling generator to create a file created at random points
-}
generator = do
    putStrLn "Enter File Name"
    file <- getLine
    g <- newStdGen
    let pts = take 1000 . unfoldr (Just . (\([a,b],c)->((a,b),c)) . splitAt 2) 
                $ randomRs(-1,1) g :: [(Float,Float)]
    writeFile file . show $ pts

{-
Call the main to read a file and pass it to the function of project
The function of the project should keep the name 'project' as described 
in the statement
-}
main= do
    putStrLn "Enter filename to read"
    name <- getLine
    file <- readFile name
    putStrLn . show . project $ readA file

readA::String->[(Float,Float)]
readA = read

我可以按照示例执行程序运行,也可以按如下方式使用生成器:

在haskell解释器中

必须输入“generator”,程序将在此处要求包含一千个点的文件名。在Haskell生成文件后,解释器必须编写main,并请求一个文件名,这是用“generator”创建的文件的名称。

问题是,对于随机生成的1000个点,我的程序需要很长时间,在具有双核处理器的计算机上需要大约3分钟。我究竟做错了什么?如何优化我的代码以更快地工作?

2 个答案:

答案 0 :(得分:13)

您使用的是二次算法:

project []  = error "Empty list of points"
project [_] = error "Single point is given"
project ps  = go 10000 ps
  where
    go a [_]    = a
    go a (p:ps) = let a2 = min a $ minimum [distance p q | q<-ps]
                  in a2 `seq` go a2 ps

您应该使用更好的算法。 Search computational-geometry tag on SO获得更好的算法。

另见http://en.wikipedia.org/wiki/Closest_pair_of_points_problem


@maxtaldykin proposes对算法进行了一个简单而有效的改进,这应该会对随机数据产生真正的影响 - 按X坐标对点进行预先排序,绝不会尝试超过d的点数远离当前点的单位,在X坐标中(其中d是当前已知的最小距离):

import Data.Ord (comparing)
import Data.List (sortBy)

project2 ps@(_:_:_) = go 10000 p1 t 
  where
    (p1:t) = sortBy (comparing fst) ps
    go d _         [] = d
    go d p1@(x1,_) t  = g2 d t
      where
        g2 d []          = go d (head t) (tail t)
        g2 d (p2@(x2,_):r)
           | x2-x1 >= d  = go d (head t) (tail t)
           | d2 >= d     = g2 d  r
           | otherwise   = g2 d2 r   -- change it "mid-flight"
               where
                 d2 = distance p1 p2

对于随机数据,g2将在O(1)时间内生效,以便go占用O(n),整个事情将受到排序的限制,{{1} }。

Empirical orders of growth显示~ n log n代表第一个代码(在1k / 2k范围内)和~ n^2.1代表第二个代码,在10k / 20k范围内,快速测试它编译 - 加载到GHCi(第二个代码运行速度比第一个运行速度快50倍,达到2,000个点,快速运行速度达到3,000个点数的80倍)。

答案 1 :(得分:6)

可以稍微修改您的强力搜索,以便在随机数据上获得更好的性能。

主要思想是按x坐标对点进行排序,并在比较循环中的距离时,仅考虑水平距离不大于当前最小距离的点。

这可能会快一些数量级,但在最坏的情况下它仍然 O(n ^ 2)
实际上,在2000点上,它在我的机器上快了50倍。

project points = loop1 10000 byX
  where
    -- sort points by x coordinate
    --  (you need import Data.Ord to use `comparing`)
    byX = sortBy (comparing fst) points

    -- loop through all points from left to right
    -- threading `d` through iterations as a minimum distance so far
    loop1 d = foldl' loop2 d . tails

    -- `tail` drops leftmost points one by one so `x` is moving from left to right
    -- and `xs` contains all points to the right of `x`
    loop2 d [] = d
    loop2 d (x:xs) = let
        -- we take only those points of `xs` whose horizontal distance
        -- is not greater than current minimum distance
        xs' = takeWhile ((<=d) . distanceX x) xs
        distanceX (a,_) (b,_) = b - a

        -- then just get minimum distance from `x` to those `xs'`
      in minimum $ d : map (distance x) xs'
是的,请不要使用这么多括号。 Haskell不需要包含函数参数。