Haskell Stack溢出

时间:2011-05-10 21:15:23

标签: memory haskell random stack-overflow genetic-algorithm

我正在写一个遗传算法来生成字符串“helloworld”。但是当n为10,000或更多时,evolve函数会产生堆栈溢出。

module Genetics where

import Data.List (sortBy)
import Random (randomRIO)
import Control.Monad (foldM)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored?
    species :: [g] -> Int

orderFitness :: (Gene g) => [g] -> [g]
orderFitness = reverse . sortBy (\a b -> compare (fitness a) (fitness b))

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    return pool'

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = do
    pool' <- compete pool
    evolve (n - 1) pool'

对于species pool = 8,8个基因库重复到8个组。每组变异,并选择每组最适合进一步进化(回到8个基因)。

GitHub

3 个答案:

答案 0 :(得分:3)

如果您对性能感兴趣,我会使用快速随机数生成器,例如:

其次,compete看起来非常可疑,因为它完全是懒惰的,尽管建造了一些潜在的大型结构。尝试使用deepseq锤子重写它有点严格:

import Control.DeepSeq    

compete :: (Gene g, NFData g) => [g] -> IO [g]
compete pool = do
    let s = species pool
    variants <- (mapM (mapM mutate) . map (replicate s)) pool
    let pool' = (map head . map orderFitness) variants
    pool' `deepseq` return pool'

但这些东西都不需要在IO中,(单独的问题)。像Rand monad这样的东西可能是more appropriate

答案 1 :(得分:2)

感谢Don的deepseq建议,我能够将问题范围缩小到mapM mutate,因为太多了。新版本有mutate',使用seq来阻止上传。

module Genetics where

import Data.List (maximumBy)
import Random (randomRIO)

class Gene g where
    -- How ideal is the gene from 0.0 to 1.0?
    fitness :: g -> Float

    -- How does a gene mutate?
    mutate :: g -> IO g

    -- How many species will be explored in each round?
    species :: [g] -> Int

best :: (Gene g) => [g] -> g
best = maximumBy (\a b -> compare (fitness a) (fitness b))

-- Prevents stack overflow
mutate' :: (Gene g) => g -> IO g
mutate' gene = do
    gene' <- mutate gene
    gene' `seq` return gene'

drift :: (Gene g) => [[g]] -> IO [[g]]
drift = mapM (mapM mutate')

compete :: (Gene g) => [g] -> IO [g]
compete pool = do
    let islands = map (replicate (species pool)) pool
    islands' <- drift islands
    let representatives = map best islands'
    return representatives

evolve :: (Gene g) => Int -> [g] -> IO [g]
evolve 0 pool = return pool
evolve n pool = compete pool >>= evolve (n - 1)

GitHub

答案 2 :(得分:1)

您可以使用(map head . map orderFitness)和单orderFitness,而不是使用sortBy maximumBy map deepseq。这不会节省太多(因为你从O(n log n)到O(n)并且可能从消除双映射获得另一个因子2),但是至少有点简单和更有效。你也可以摆脱逆转的呼吁。

我怀疑这可以在没有head . sortBy的情况下解决问题,但它应该是一种改进。

编辑:如果标准库和GHC是完美的,那么maximumBy将生成与map head . map sortBy完全相同的代码,而map (head . sortBy)会生成与sortBy相同的代码,遗憾的是这些代码都不是事情在实践中可能是真实的。 maximumBy会倾向于做一堆额外的内存分配,因为它是一种分而治之的算法。组合地图是您有时获得的优化,但不应指望。

更重要的是,使用{{1}}更具说明性。更容易看到代码的作用以及需要多长时间。利用优化也应该更容易,因为我们知道目标是什么,而不仅仅是我们如何得到它。