我使用GA Package来最小化功能。以下是我实施的几个阶段。
0。库和数据集
library(clusterSim) ## for index.DB()
library(GA) ## for ga()
data("data_ratio")
dataset2 <- data_ratio
set.seed(555)
1。二进制编码并生成初始填充。
initial_population <- function(object) {
## generate a population where for each individual, there will be number of 1's fixed between three to six
population <- t(replicate(object@popSize, {i <- sample(3:6, 1); sample(c(rep(1, i), rep(0, object@nBits - i)))}))
return(population)
}
2。健身功能最小化Davies-Bouldin(DB)指数。
DBI2 <- function(x) {
## number of 1's will represent the initial selected centroids and hence the number of clusters
cl <- kmeans(dataset2, dataset2[x == 1, ])
dbi <- index.DB(dataset2, cl=cl$cluster, centrotypes = "centroids")
score <- -dbi$DB
return(score)
}
第3。用户定义的交叉算子。这种交叉方法可以避免没有群集被打开的情况。可以找到伪代码here。
pairwise_crossover <- function(object, parents){
fitness <- object@fitness[parents]
parents <- object@population[parents, , drop = FALSE]
n <- ncol(parents)
children <- matrix(as.double(NA), nrow = 2, ncol = n)
fitnessChildren <- rep(NA, 2)
## finding the min no. of 1's between 2 parents
m <- min(sum(parents[1, ] == 1), sum(parents[2, ] == 1))
## generate a random int from range(1,m)
random_int <- sample(1:m, 1)
## randomly select 'random_int' gene positions with 1's in parent[1, ]
random_a <- sample(1:length(parents[1, ]), random_int)
## randomly select 'random_int' gene positions with 1's in parent[1, ]
random_b <- sample(1:length(parents[2, ]), random_int)
## union them
all <- sort(union(random_a, random_b))
## determine the union positions
temp_a <- parents[1, ][all]
temp_b <- parents[2, ][all]
## crossover
parents[1, ][all] <- temp_b
children[1, ] <- parents[1, ]
parents[2, ][all] <- temp_a
children[2, ] <- parents[2, ]
out <- list(children = children, fitness = fitnessChildren)
return(out)
}
4。突变
k_min <- 2
k_max <- ceiling(sqrt(75))
my_mutation <- function(object, parent){
pop <- parent <- as.vector(object@population[parent, ])
for(i in 1:length(pop)){
if((sum(pop == 1) < k_max) && pop[i] == 0 | (sum(pop == 1) > k_min && pop[i] == 1)) {
pop[i] <- abs(pop[i] - 1)
return(pop)
}
}
}
5。将各个部分放在一起。使用轮盘赌选择,交叉概率。 = 0.8,突变概率。 = 0.1
g2<- ga(type = "binary",
population = initial_population,
fitness = DBI2,
selection = ga_rwSelection,
crossover = pairwise_crossover,
mutation = my_mutation,
pcrossover = 0.8,
pmutation = 0.1,
popSize = 100,
nBits = nrow(dataset2))
我已经创建了我的初始人口,对于人口中的每个人,将有1's
的数量固定在三到六之间。交叉和变异算子旨在确保解决方案最终不会有太多的群集(1's
)被打开&#39;。我在整合它们之前已经分别尝试了我的交叉和变异功能,它们似乎工作得很好。
理想情况下,最终解决方案的初始人口数量为1's
+ - = 1,即,如果一个人的染色体中有三个1's
,它最终会随机出现两个,三或四1's
。但是我得到了这个解决方案,它显示了12个群集(1's
)被启用了,这意味着交叉和变异运算符的运行状况良好。
> sum(g2@solution==1)
[1] 12
这里的问题可以通过复制所有代码来重现。熟悉GA套餐的人可以帮助我吗?
[EDITED]
尝试使用其他数据集iris
,让我陷入以下错误。 (仅更改了数据,其余设置仍然存在)
0。库和数据集
library(clusterSim) ## for index.DB()
library(GA) ## for ga()
## removed last column since it is a categorical data
dataset2 <- iris[-5]
set.seed(555)
> Error in kmeans(dataset2, centers = dataset2[x == 1, ]) :
initial centers are not distinct
我尝试查看code,发现此错误是由if(any(duplicated(centers)))
引起的。它可能意味着什么?
答案 0 :(得分:2)
值得一提的几点:
在crossover
中,为了随机选择&#39; random_int&#39;在parent[1, ]
中使用1&#39;的基因位置,您可以从
random_a <- sample(1:length(parents[1, ]), random_int)
到
random_a <- sample(which(parents[1, ]==1), random_int)
和其他人一样。
然而,我认为这种交叉策略可以保证任何后代都可以将群集位的总数最多打开作为其父项的最大1位数(在这种情况下,从初始群体中可以为6,不应该如果你想在解决方案基因中只有1位差异,它会是4?)。
下图显示3个随机选择的位置,其中至少一个亲本基因具有1位,而交叉和后代产生。
在mutation
函数中,我认为,为了更明确,我们应该更改这行代码
if((sum(pop == 1) < k_max) && pop[i] == 0 | (sum(pop == 1) > k_min && pop[i] == 1))
通过
if((sum(pop == 1) < k_max && pop[i] == 0) | (sum(pop == 1) > k_min && pop[i] == 1))
用正确的括号。
此外,您的fitness
函数(Davies-Bouldin's index
测量群集分离)似乎有利于打开更多群集。
最后我认为mutation
是罪魁祸首,如果你将k_max
改为低值(例如3)而pmutation
改为a低值(例如,pmutation = 0.01
),您将在最终解决方案中发现所有基因都打开了4位。
<强> [EDITED] 强>
set.seed(1234)
k_min = 2
k_max = 3 #ceiling(sqrt(75))
#5. Putting the pieces together. Using roulette-wheel selection, crossover prob. = 0.8, mutation prob. = 0.1
g2<- ga(type = "binary",
population = initial_population,
fitness = DBI2,
selection = ga_rwSelection,
crossover = pairwise_crossover,
mutation = my_mutation,
pcrossover = 0.8,
pmutation = 0.01,
popSize = 100,
nBits = nrow(dataset2))
g2@solution # there are 6 solution genes
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 x11 x12 x13 x14 x15 x16 x17 x18 x19 x20 x21 x22 x23 x24 x25 x26 x27 x28 x29 x30 x31 x32 x33 x34 x35 x36 x37
[1,] 0 0 0 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
x38 x39 x40 x41 x42 x43 x44 x45 x46 x47 x48 x49 x50 x51 x52 x53 x54 x55 x56 x57 x58 x59 x60 x61 x62 x63 x64 x65 x66 x67 x68 x69 x70 x71 x72
[1,] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[2,] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[4,] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[5,] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
[6,] 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
x73 x74 x75
[1,] 0 0 0
[2,] 0 0 0
[3,] 0 0 0
[4,] 0 0 0
[5,] 0 0 0
[6,] 0 0 0
rowSums(g2@solution) # all of them have 4 bits on
#[1] 4 4 4 4 4 4
<强> [EDIT2] 强>