Question

怎么回事：基于元组（id，x，y）的集合，找到x和y的最小最大值，然后创建两个点（红点）。元组中的每个元素根据朝向红点的距离分组为两组。

phase 1

每组不能超过5个点。如果超过，则应计算新组。我已经成功完成了第一阶段的递归。但我不知道如何在第二阶段做到这一点。第二阶段应如下所示：

phase 2

基于这两组，再次需要找到x和y的最大最大值（对于每组），然后创建四个点（红点）。元组中的每个元素根据朝向红点的距离分组为两组。

getDistance :: (Int, Double, Double) -> (Int, Double, Double) -> Double
getDistance (_,x1,y1) (_,x2,y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2
getTheClusterID :: (Int, Double, Double) -> Int
getTheClusterID  (id, _, _) = id

idxy = [(id, x, y)]
createCluster id cs = [(id, minX, minY),(id+1, maxX, minY), (id+2, minX, maxY), (id+3, maxX, maxY)]
                        where minX = minimum $ map (\(_,x,_,_) -> x) cs
                              maxX = maximum $ map (\(_,x,_,_) -> x) cs
                              minY = minimum $ map (\(_,_,y,_) -> y) cs
                              maxY = maximum $ map (\(_,_,y,_) -> y) cs
idCluster = [1]
cluster = createCluster (last idCluster) idxy

clusterThis (id,a,b) = case (a,b) of
  j | getDistance (a,b) (cluster!!0) < getDistance (a,b) (cluster!!1) &&
        -> (getTheClusterID (cluster!!0), a, b) 
  j | getDistance (a,b) (cluster!!1) < getDistance (a,b) (cluster!!0) &&
        -> (getTheClusterID (cluster!!1), a, b)
  _ -> (getTheClusterID (cluster!!0), a, b)

groupAll = map clusterThis idxy

我正在从命令式转向功能型。对不起，如果我的思维方式仍然是势在必行。还在学习。

修改澄清一下，这是原始数据的样子。

Answer 1

编写这样一个算法时遵循的基本原则是编写小型的组合程序;然后，每个程序都易于推理和单独测试，最终的程序可以用较小的程序编写。

该算法可归纳如下：

计算绑定点集的点。
将其余的点分成两个簇，一个包含更接近最小点的点，另一个包含所有其他点（相当于更接近最大点的点）。
如果任何群集包含超过5个点，请在该群集上重复此过程。

重复过程＆＃39;步骤表示这是一个divide and conquer问题。

我认为每个点都不需要ID，所以我放弃了。

首先，为您将使用的数据的每个类型定义数据类型：

import Data.List (partition)

data Point = Point { ptX :: Double, ptY :: Double }
data Cluster = Cluster { clusterPts :: [Point] }

这对于这样简单的数据来说可能看起来很愚蠢，但它可能会在调试过程中为您节省相当多的混乱。另请注意导入我们稍后将使用的函数。

第一步：

minMaxPoints :: [Point] -> (Point, Point)
minMaxPoints ps = 
   (Point minX minY
   ,Point maxX maxY)
     where minX = minimum $ map ptX ps
           maxX = maximum $ map ptX ps
           minY = minimum $ map ptY ps
           maxY = maximum $ map ptY ps

这与您的createCluster功能基本相同。

第二步：

pointDistance :: Point -> Point -> Double
pointDistance (Point x1 y1) (Point x2 y2) = sqrt $ (x1-x2)^2 + (y1-y2)^2

cluster1 :: [Point] -> [Cluster]
cluster1 ps =
  let (mn, mx) = minMaxPoints ps
      (psmn, psmx) = partition (\p -> pointDistance mn p < pointDistance mx p) ps
  in [ Cluster psmn, Cluster psmx ]

此函数应该清除 - 它是将此步骤的上述语句直接转换为代码。 partition函数接受谓词和列表并生成两个列表，第一个包含谓词为true的所有元素，第二个包含false的所有元素。 pointDistance与您的getDistance函数基本相同。

第3步：

cluster :: [Point] -> [Cluster]
cluster ps =
  cluster1 ps >>= \cl@(Cluster c) ->
  if length c > 5
  then cluster c
  else [cl]

这也非常直接地实现了上述陈述。也许唯一令人困惑的部分是使用>>=，其中（此处）的类型为[a] -> (a -> [b]) -> [b];它只是将给定函数应用于给定列表的每个元素，并连接结果（等效地，它写为flip concatMap）。

最后你的测试用例（我希望我已经从图片正确地翻译成Haskell数据）：

testPts :: [Point]
testPts = map (uncurry Point)
  [ (0,0), (1,0), (2,1), (0,2)
  , (5,2), (5,4), (4,3), (4,4)
  , (8,2), (9,3), (10,2)
  , (11,4), (12,3), (13,3), (13,5) ]

main = mapM_ (print . map (\p -> (ptX p, ptY p)) . clusterPts) $ cluster testPts

运行此程序会产生

[(0.0,0.0),(0.0,2.0),(2.0,1.0),(1.0,0.0)]
[(4.0,4.0),(5.0,2.0),(5.0,4.0),(4.0,3.0)]
[(10.0,2.0),(9.0,3.0),(8.0,2.0)]
[(13.0,3.0),(12.0,3.0),(11.0,4.0),(13.0,5.0)]

Answer 2

功能程序员喜欢递归，但他们竭尽全力避免编写它。人们，Jeez，下定决心！

我喜欢使用常见的，易于理解的组合器来尽可能地构建我的代码。我想演示一种Haskell编程风格，它严重依赖标准工具来尽可能简洁和一般地实现程序的无聊部分（映射，压缩，循环），使您能够专注于手头的问题。

如果您不了解这里的所有内容，请不要担心。我只是想告诉你什么是可能的！（如果您有任何疑问，请询问！）

载体

首先要做的事情是：我们正在处理二维空间，所以我们需要二维向量和一些中学向量代数来处理它们。

我将通过构建我们的向量空间的标量来参数化我的向量。这允许我使用标准类型类，如Functor，因此我可以将构建矢量代数的许多工作委托给机器。我已打开DeriveFunctor和DeriveFoldable，这样我就可以说出 deriving (Functor, Foldable) 这些神奇的词语。

data Pair a = Pair {
    px :: a,
    py :: a
} deriving (Show, Functor, Foldable)

此后我将避免明确使用Pair，并编程到接口，而不是实现。这允许我以与向量空间的维度无关的方式构建简单的线性代数库。我将根据V2：

提供示例类型签名

type V2 = Pair Double

标量乘法：仿函数

vector space需要有两个操作：标量乘法和向量加法。标量乘法意味着将矢量的每个分量乘以常数标量。如果您将向量视为组件的容器，则应该清楚这意味着＆＃34;对容器中的每个元素执行相同的操作＆＃34; - 也就是说，它是映射操作。这是Functor的用途。

-- mul :: Double -> V2 -> V2
mul :: (Functor f, Num n) => n -> f n -> f n
mul k f = fmap (k *) f

矢量添加：zippy applicatives

向量加法涉及逐点累加向量的分量。考虑将矢量作为组件的容器，添加是一个压缩操作 - 匹配两个矢量的每个元素并将它们相加。

Applicative functors是带有额外＆＃34;应用＆＃34;操作。考虑将仿函数f作为容器，Applicative <*> :: f (a -> b) -> f a -> f b为您提供了一种方法来获取函数的容器并将其应用于值的容器以获得新的容器的价值观。应该清楚的是，将Pair变为Applicative的一种方法是使用压缩将函数应用于值。

instance Applicative Pair where
    pure x = Pair x x
    Pair f g <*> Pair x y = Pair (f x) (g y)

（有关zippy应用程序的另一个示例，请参阅this answer of mine。）

现在我们有了拉链两对的方法，我们可以利用a bit of standard Applicative machinery来实现向量添加。

-- add :: V2 -> V2 -> V2
add :: (Applicative f, Num n) => f n -> f n -> f n
add = liftA2 (+)

矢量减法，它为您提供了一种查找两点之间距离的方法，它是根据乘法和加法来定义的。

-- minus :: V2 -> V2 -> V2
minus :: (Applicative f, Num n) => f n -> f n -> f n
v `minus` u = v `add` mul (-1) u

圆点产品：可折叠容器

2D欧几里德空间实际上是一个Hilbert space - 一个矢量空间，它配备了一种以dot product形式测量长度和角度的方法。要获取两个向量的点积，可以将组件相乘，然后将结果相加。再一次，我们将使用Applicative来增加组件，但这只是给了我们另一个向量：我们如何实现＆＃34;将结果相加＆＃34;？

Foldable是允许＆＃34;聚合＆＃34;的容器类。操作foldr :: (a -> b -> b) -> b -> f a -> b。标准前奏sum是根据foldr定义的，所以：

-- dot :: V2 -> V2 -> Double
dot :: (Applicative f, Foldable f, Num n) => f n -> f n -> n
v `dot` u = sum $ liftA2 (*) v u

这为我们提供了一种查找向量绝对长度的方法：将其与自身对齐并取平方根。

-- modulus :: V2 -> Double
modulus :: (Applicative f, Foldable f, Floating n) => f n -> n
modulus v = sqrt $ v `dot` v

因此两点之间的距离是矢量差的模数。

dist :: (Applicative f, Foldable f, Floating n) => f n -> f n -> n
dist v u = modulus (v `minus` u)

N-ary压缩：可穿越的容器

轴对齐（超）矩形只能由两个点定义。我们将一组点的边界框表示为指向边界框对角的Pair向量。

给定组件向量的集合，我们可以通过查找集合中每个组件的最大值和最小值来找到边界框的对角。这要求我们将组件向量的集合压缩或转置到组件集合的向量中。为此，我将使用Traversable＆＃39; sequenceA。

-- boundingBox :: [V2] -> Pair V2
boundingBox :: (Traversable t, Applicative f, Ord n) => t (f n) -> Pair (f n)
boundingBox vs =
    let components = sequenceA vs
    in Pair (minimum <$> components) (maximum <$> components)

聚类

既然我们有一个用于处理向量的库，我们可以深入到算法的多肉部分：将点集分成簇。

分区

让我重新说明算法内循环的规范。您希望分区一组点，具体取决于它们是否更靠近集合边界框的左下角或右上角。这是partition的作用。

我们可以编写一个函数whichCluster，使用minus和modulus来确定单个点，然后使用partition将其应用于整个集合

type Cluster = []
-- cluster :: Cluster V2 -> [Cluster V2]
cluster :: (Applicative f, Foldable f, Ord n, Floating n) => Cluster (f n) -> [Cluster (f n)]
cluster vs =
    let Pair bottomLeft topRight = boundingBox vs
        whichCluster v = dist v bottomLeft <= dist v topRight
        (g1, g2) = partition whichCluster vs
    in [g1, g2]

重复，重复，重复

现在我们要反复cluster，直到我们没有任何超过5的组。这是计划。我们将跟踪两组集群，即足够小的集群和需要进一步子集群的集群。我将使用partition将群集列表分类为足够小的群集和需要子群集的群集。我将使用列表monad 的>>= :: [a] -> (a -> [b]) -> [b]（此处为[Cluster V2] -> ([V2] -> [Cluster V2]) -> [Cluster V2]），它将函数映射到列表并展平结果，以实现子集群的概念。并且我将until重复使用子集群，直到剩余的太大集群为空。

-- smallClusters :: Int -> Cluster V2 -> [Cluster V2]
smallClusters :: (Applicative f, Foldable f, Ord n, Floating n) => Int -> Cluster (f n) -> [Cluster (f n)]
smallClusters maxSize vs = fst $ until (null . snd) splitLarge ([], [vs])
    where
        smallEnough xs = length xs <= maxSize
        splitLarge (small, remaining) =
            let (newSmall, large) = partition smallEnough remaining
            in (small ++ newSmall, large >>= cluster)

快速测试，来自@user2407038's answer：

testPts :: [V2]
testPts = map (uncurry Pair)
    [ (0,0), (1,0), (2,1), (0,2)
    , (5,2), (5,4), (4,3), (4,4)
    , (8,2), (9,3), (10,2)
    , (11,4), (12,3), (13,3), (13,5) ]

ghci> smallClusters 5 testPts
[
    [Pair {px = 0.0, py = 0.0},Pair {px = 1.0, py = 0.0},Pair {px = 2.0, py = 1.0},Pair {px = 0.0, py = 2.0}],
    [Pair {px = 5.0, py = 2.0},Pair {px = 5.0, py = 4.0},Pair {px = 4.0, py = 3.0},Pair {px = 4.0, py = 4.0}],
    [Pair {px = 8.0, py = 2.0},Pair {px = 9.0, py = 3.0},Pair {px = 10.0, py = 2.0}]
    [Pair {px = 11.0, py = 4.0},Pair {px = 12.0, py = 3.0},Pair {px = 13.0, py = 3.0},Pair {px = 13.0, py = 5.0}]
    ]

你去吧。 n维空间中的小聚类，都没有单一的递归函数。

您的原始代码将点数表示为3元组，其中包含位置的两个Double和点标签的Int，但我的V2没有标签。我们能恢复吗？好吧，因为代码在任何时候都没有提到任何具体类型 - 只是标准类型类 - 我们可以为标记向量构建一个新类型。只要所述类型为Foldable Applicative，上述所有代码都将继续有效而无需修改！

data Labelled m f a = Labelled m (f a) deriving (Show, Functor, Foldable)

instance (Monoid m, Applicative f) => Applicative (Labelled m f) where
    pure = Labelled mempty . pure
    Labelled m ff <*> Labelled n fx = Labelled (m <> n) (ff <*> fx)

存在Monoid约束，因为在组合操作时，您还需要一种方法来组合其标签。我只是要使用First - 偏向偏见的选择 - 因为我并不期待这些分数＆＃39;标签与modulus和boundingBox等压缩操作相关。

type LabelledV2 = Labelled (First Int) Pair Double

testPts :: [LabelledV2]
testPts = zipWith (Labelled . First . Just) [0..] $ map (uncurry Pair)
    [ (0,0), (1,0), (2,1), (0,2)
    , (5,2), (5,4), (4,3), (4,4)
    , (8,2), (9,3), (10,2)
    , (11,4), (12,3), (13,3), (13,5) ]

ghci> traverse (traverse (getFirst . lbl)) $ smallClusters 5 testPts
Just [[0,1,2,3],[4,5,6,7],[8,9,10],[11,12,13,14]]  -- try reordering testPts

Haskell ::递归循环中的递归

2 个答案:

载体

标量乘法：仿函数

矢量添加：zippy applicatives

圆点产品：可折叠容器

N-ary压缩：可穿越的容器

聚类

分区

重复，重复，重复

标签