Question

我使用GA Package来最小化功能。以下是我实施的几个阶段。

0。库和数据集

library(clusterSim)      ## for index.DB()
library(GA)              ## for ga() 
data("data_ratio")
dataset2 <- data_ratio
set.seed(555)

1。二进制编码并生成初始填充。

initial_population <- function(object) {
    ## generate a population where for each individual, there will be number of 1's fixed between three to six
    population <- t(replicate(object@popSize, {i <- sample(3:6, 1); sample(c(rep(1, i), rep(0, object@nBits - i)))}))
    return(population)
}

2。健身功能最小化Davies-Bouldin（DB）指数。

DBI2 <- function(x) {
    ## number of 1's will represent the initial selected centroids and hence the number of clusters
    cl <- kmeans(dataset2, dataset2[x == 1, ])
    dbi <- index.DB(dataset2, cl=cl$cluster, centrotypes = "centroids")
    score <- -dbi$DB

    return(score)
}

第3。用户定义的交叉算子。这种交叉方法可以避免没有群集被打开的情况。可以找到伪代码here。

pairwise_crossover <- function(object, parents){
    fitness <- object@fitness[parents]
    parents <- object@population[parents, , drop = FALSE]
    n <- ncol(parents)
    children <- matrix(as.double(NA), nrow = 2, ncol = n)
    fitnessChildren <- rep(NA, 2)

    ## finding the min no. of 1's between 2 parents
    m <- min(sum(parents[1, ] == 1), sum(parents[2, ] == 1))
    ## generate a random int from range(1,m)
    random_int <- sample(1:m, 1)
    ## randomly select 'random_int' gene positions with 1's in parent[1, ]
    random_a <- sample(1:length(parents[1, ]), random_int)
    ## randomly select 'random_int' gene positions with 1's in parent[1, ]
    random_b <- sample(1:length(parents[2, ]), random_int)
    ## union them
    all <- sort(union(random_a, random_b))
    ## determine the union positions
    temp_a <- parents[1, ][all]
    temp_b <- parents[2, ][all]

    ## crossover
    parents[1, ][all] <- temp_b
    children[1, ] <- parents[1, ]
    parents[2, ][all] <- temp_a
    children[2, ] <- parents[2, ]

    out <- list(children = children, fitness = fitnessChildren)
    return(out)
}

4。突变

k_min <- 2
k_max <- ceiling(sqrt(75))

my_mutation <- function(object, parent){
    pop <- parent <- as.vector(object@population[parent, ])
    for(i in 1:length(pop)){
            if((sum(pop == 1) < k_max) && pop[i] == 0 | (sum(pop == 1) > k_min && pop[i] == 1)) {
                    pop[i] <- abs(pop[i] - 1)
                    return(pop) 
            }

    }

}

5。将各个部分放在一起。使用轮盘赌选择，交叉概率。 = 0.8，突变概率。 = 0.1

g2<- ga(type = "binary", 
    population = initial_population, 
    fitness = DBI2, 
    selection = ga_rwSelection,
    crossover = pairwise_crossover,
    mutation = my_mutation,
    pcrossover = 0.8,
    pmutation = 0.1,
    popSize = 100, 
    nBits = nrow(dataset2))

我已经创建了我的初始人口，对于人口中的每个人，将有1's的数量固定在三到六之间。交叉和变异算子旨在确保解决方案最终不会有太多的群集（1's）被打开＆＃39;。我在整合它们之前已经分别尝试了我的交叉和变异功能，它们似乎工作得很好。

理想情况下，最终解决方案的初始人口数量为1's + - = 1，即，如果一个人的染色体中有三个1's，它最终会随机出现两个，三或四1's。但是我得到了这个解决方案，它显示了12个群集（1's）被启用了，这意味着交叉和变异运算符的运行状况良好。

> sum(g2@solution==1)
[1] 12

这里的问题可以通过复制所有代码来重现。熟悉GA套餐的人可以帮助我吗？

[EDITED]

尝试使用其他数据集iris，让我陷入以下错误。（仅更改了数据，其余设置仍然存在）

0。库和数据集

library(clusterSim)      ## for index.DB()
library(GA)              ## for ga() 
## removed last column since it is a categorical data
dataset2 <- iris[-5]
set.seed(555)

> Error in kmeans(dataset2, centers = dataset2[x == 1, ]) : 
  initial centers are not distinct

我尝试查看code，发现此错误是由if(any(duplicated(centers)))引起的。它可能意味着什么？

Answer 1

值得一提的几点：

在crossover中，为了随机选择＆＃39; random_int＆＃39;在parent[1, ]中使用1＆＃39;的基因位置，您可以从
更改以下代码行
random_a <- sample(1:length(parents[1, ]), random_int)

到

random_a <- sample(which(parents[1, ]==1), random_int)

和其他人一样。

然而，我认为这种交叉策略可以保证任何后代都可以将群集位的总数最多打开作为其父项的最大1位数（在这种情况下，从初始群体中可以为6，不应该如果你想在解决方案基因中只有1位差异，它会是4？）。

下图显示3个随机选择的位置，其中至少一个亲本基因具有1位，而交叉和后代产生。

在mutation函数中，我认为，为了更明确，我们应该更改这行代码

if((sum(pop == 1) < k_max) && pop[i] == 0 | (sum(pop == 1) > k_min && pop[i] == 1))

通过

if((sum(pop == 1) < k_max && pop[i] == 0) | (sum(pop == 1) > k_min && pop[i] == 1))

用正确的括号。
此外，您的fitness函数（Davies-Bouldin's index测量群集分离）似乎有利于打开更多群集。

最后我认为mutation是罪魁祸首，如果你将k_max改为低值（例如3）而pmutation改为a低值（例如，pmutation = 0.01），您将在最终解决方案中发现所有基因都打开了4位。

<强> [EDITED]

set.seed(1234)
k_min = 2
k_max = 3 #ceiling(sqrt(75))
#5. Putting the pieces together. Using roulette-wheel selection, crossover prob. = 0.8, mutation prob. = 0.1
g2<- ga(type = "binary", 
        population = initial_population, 
        fitness = DBI2, 
        selection = ga_rwSelection,
        crossover = pairwise_crossover,
        mutation = my_mutation,
        pcrossover = 0.8,
        pmutation = 0.01,
        popSize = 100, 
        nBits = nrow(dataset2))    

g2@solution # there are 6 solution genes
    x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23 x24 x25 x26 x27 x28 x29 x30 x31 x32 x33 x34 x35 x36 x37
[1,]  0  0  0  0  0  0  1  0  1   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[2,]  0  0  0  0  0  0  1  0  0   0   1   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[3,]  0  0  0  0  0  0  1  0  0   0   1   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[4,]  0  0  0  0  0  0  1  1  0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[5,]  0  0  0  1  0  0  0  0  0   0   1   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[6,]  0  0  0  1  0  0  0  0  0   0   1   0   0   0   0   0   0   0   0   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
     x38 x39 x40 x41 x42 x43 x44 x45 x46 x47 x48 x49 x50 x51 x52 x53 x54 x55 x56 x57 x58 x59 x60 x61 x62 x63 x64 x65 x66 x67 x68 x69 x70 x71 x72
[1,]   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[2,]   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[3,]   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[4,]   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[5,]   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
[6,]   0   0   1   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0   0
     x73 x74 x75
[1,]   0   0   0
[2,]   0   0   0
[3,]   0   0   0
[4,]   0   0   0
[5,]   0   0   0
[6,]   0   0   0

rowSums(g2@solution) # all of them have 4 bits on
#[1] 4 4 4 4 4 4

<强> [EDIT2]

实际上，crossover策略保证组合父项没有额外的位，即子节点中的1位总数=父母中的1位总数，但任何单个孩子都可以有更多的位打开。下面显示了个别后代可以比任何父母打开更多位的示例：

遗传算法 - 交叉和变异无法正常工作

1 个答案: