Question

我正在写一个遗传算法来生成字符串“helloworld”。但是当n为10,000或更多时，evolve函数会产生堆栈溢出。

module Genetics where

import Data.List (sortBy)
import Random (randomRIO)
import Control.Monad (foldM)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored?
    species :: [g] -> Int

orderFitness :: (Gene g) => [g] -> [g]
orderFitness = reverse . sortBy (\a b -> compare (fitness a) (fitness b))

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    return pool'

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = do
    pool' <- compete pool
    evolve (n - 1) pool'

对于species pool = 8，8个基因库重复到8个组。每组变异，并选择每组最适合进一步进化（回到8个基因）。

GitHub

Answer 1

如果您对性能感兴趣，我会使用快速随机数生成器，例如：

The mersenne-random package

其次，compete看起来非常可疑，因为它完全是懒惰的，尽管建造了一些潜在的大型结构。尝试使用deepseq锤子重写它有点严格：

import Control.DeepSeq    

compete :: (Gene g, NFData g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    pool' `deepseq` return pool'

但这些东西都不需要在IO中，（单独的问题）。像Rand monad这样的东西可能是more appropriate。

Answer 2

感谢Don的deepseq建议，我能够将问题范围缩小到mapM mutate，因为太多了。新版本有mutate'，使用seq来阻止上传。

module Genetics where

import Data.List (maximumBy)
import Random (randomRIO)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored in each round?
    species :: [g] -> Int

best :: (Gene g) => [g] -> g
best = maximumBy (\a b -> compare (fitness a) (fitness b))

-- Prevents stack overflow
mutate' :: (Gene g) => g -> IO g
mutate' gene = do
    gene' <- mutate gene
    gene' `seq` return gene'

drift :: (Gene g) => [[g]] -> IO [[g]]
drift = mapM (mapM mutate')

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let islands = map (replicate (species pool)) pool
    islands' <- drift islands
    let representatives = map best islands'
    return representatives

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = compete pool >>= evolve (n - 1)

GitHub

Answer 3

您可以使用(map head . map orderFitness)和单orderFitness，而不是使用sortBy maximumBy map deepseq。这不会节省太多（因为你从O（n log n）到O（n）并且可能从消除双映射获得另一个因子2），但是至少有点简单和更有效。你也可以摆脱逆转的呼吁。

我怀疑这可以在没有head . sortBy的情况下解决问题，但它应该是一种改进。

编辑：如果标准库和GHC是完美的，那么maximumBy将生成与map head . map sortBy完全相同的代码，而map (head . sortBy)会生成与sortBy相同的代码，遗憾的是这些代码都不是事情在实践中可能是真实的。 maximumBy会倾向于做一堆额外的内存分配，因为它是一种分而治之的算法。组合地图是您有时获得的优化，但不应指望。

更重要的是，使用{{1}}更具说明性。更容易看到代码的作用以及需要多长时间。利用优化也应该更容易，因为我们知道目标是什么，而不仅仅是我们如何得到它。

Haskell Stack溢出

3 个答案: