关注question 我使用以下代ç :
dist<-c('att1','att2','att3','att4','att5','att6')
p1<-c('att1','att5','att2')
p2<-c('att5','att1','att4')
p3<-c('att3','att4','att2')
p4<-c('att1','att2','att3')
p5<-c('att6')
....
p32<-c('att35','att34','att32')
在实际情况ä¸ï¼Œå¯ä»¥æœ‰1024个å‘é‡ã€‚
我想找到所有相关的p
,它们的统一将是dist的最大组æˆéƒ¨åˆ†ã€‚在这ç§æƒ…况下,解决方案将是p1
,p3
,p5
。我想选择最å°æ•°é‡çš„p
。å¦å¤–,如果没有办法覆盖所有的dist组件,所以我想选择具有最å°å‘é‡æ•°çš„最大覆盖(p)。
N = 32
library(qdapTools)
library(dplyr)
library(data.table)
## generate matrix of attributes
attribute_matrix <- mtabulate(list(p1, p2, p3, p4, p5,...,p32))
library (bigmemory)
## generate matrix of attributes
grid_matrix <- do.call(CJ, rep(list(1:0), N)) %>% as.big.matrix
Error: cannot allocate vector of size 8.0 Gb
我å°è¯•äº†å¦ä¸€ç§æ–¹æ³•ï¼š
grid_matrix <- do.call(CJ, rep(list(1:0), N)) %>% as.data.frame
grid_matrix <- as.matrix (grid_matrix)
ä»ç„¶æœ‰åŒæ ·çš„错误。
如何修å¤å®ƒå¹¶å°†å…¶ç”¨äºŽå¤§æ•°æ®ï¼Ÿæˆ‘想继ç»ï¼š
colnames(grid_matrix) <- paste0("p", 1:N)
combin_all_element_present <- rowSums(grid_matrix %*% attribute_matrix > 0) %>% `==`(., ncol(attribute_matrix))
grid_matrix_sub <- grid_matrix[combin_all_element_present, ]
grid_matrix_sub[rowSums(grid_matrix_sub) == min(rowSums(grid_matrix_sub)), ]
ç”案 0 :(得分:2)
这被称为集åˆè¦†ç›–问题。它å¯ä»¥ä½¿ç”¨æ•´æ•°çº¿æ€§ç¼–程æ¥è§£å†³ã€‚令x1,x2,...为0/1å˜é‡ï¼ˆæ¯ä¸ªpå˜é‡ä¸€ä¸ªï¼‰å¹¶è¡¨ç¤ºp1,p2,...为0/1å‘é‡P1,P2,...å’Œdist为 0/1å‘é‡D.然åŽé—®é¢˜å¯ä»¥è¡¨ç¤ºä¸ºï¼š
min x1 + x2 + ... + x32
such that
P1 * x1 + P2 + x2 + ... + P32 * x32 >= D
在R代ç ä¸å¦‚下。首先使用排åºé¡ºåºçš„på‘é‡åˆ›å»ºåˆ—表p
。使用mixedsort
,以便在p3之åŽp32结æŸè€Œä¸æ˜¯rigth。将attnames
定义为所有på‘é‡ä¸æ‰€æœ‰attå称的集åˆã€‚
 然åŽåˆ¶å®šç›®æ ‡å‡½æ•°ï¼ˆå…¶ç‰äºŽå°é¢ä¸çš„pçš„æ•°é‡ï¼‰ï¼Œçº¦æŸçŸ©é˜µï¼ˆç”±På‘é‡ä½œä¸ºåˆ—组æˆï¼‰å’Œçº¦æŸæ–¹ç¨‹çš„å³æ‰‹ä¾§ï¼ˆå…¶ä¸ºdist作为0/1å‘é‡ï¼‰ã€‚最åŽè¿è¡Œæ•´æ•°çº¿æ€§ç¨‹åºå¹¶å°†è§£å†³æ–¹æ¡ˆä»Ž0/1å‘é‡è½¬æ¢ä¸ºpå称的å‘é‡ã€‚
library(gtools)
library(lpSolve)
p <- mget(mixedsort(ls(pattern = "^p\\d+$")))
attnames <- mixedsort(unique(unlist(p)))
objective <- rep(1L, length(p))
const.mat <- sapply(p, function(x) attnames %in% x) + 0L
const.rhs <- (attnames %in% dist) + 0L
ans <- lp("min", objective, const.mat, ">=", const.rhs, all.bin = TRUE)
names(p)[ans$solution == 1L]
## [1] "p2" "p4" "p5"
约æŸçŸ©é˜µçš„æ¯ä¸ªattnames
æ¡ç›®éƒ½æœ‰ä¸€è¡Œï¼Œæ¯ä¸ªp
å‘é‡æœ‰ä¸€åˆ—。
该解决方案生æˆattnames
ä¸dist
ä¸ªå…ƒç´ çš„æœ€å°è¦†ç›–率。如果dist
çš„æ¯ä¸ªå…ƒç´ 都出现在至少一个p
å‘é‡ä¸ï¼Œåˆ™è¯¥è§£å†³æ–¹æ¡ˆå°†ä»£è¡¨dist
çš„å°é¢ã€‚如果ä¸æ˜¯ï¼Œè¯¥è§£å†³æ–¹æ¡ˆå°†ä»£è¡¨p
ä¸çš„一个或多个dist
å‘é‡ä¸çš„那些å称的å°é¢;å› æ¤ï¼Œè¿™å¤„ç†äº†é—®é¢˜ä¸è®¨è®ºçš„两ç§æƒ…况。 dist
çš„æœªè¦†ç›–å…ƒç´ æ˜¯ï¼š
setdiff(dist, attnames)
å› æ¤å¦‚果长度为零,则解决方案代表dist
的完整å°é¢ã€‚如果ä¸æ˜¯ï¼Œè§£å†³æ–¹æ¡ˆä»£è¡¨
intersect(dist, attnames)
在代ç ä¸å®Œæˆçš„排åºå¹¶ä¸æ˜¯éžå¸¸éœ€è¦ï¼Œä½†é€šè¿‡ä½¿çº¦æŸçŸ©é˜µçš„行和列按逻辑顺åºæŽ’列,å¯ä»¥æ›´å®¹æ˜“地处ç†ä¼˜åŒ–çš„å„ç§è¾“入。
注æ„:在è¿è¡Œä¸Šè¿°ä»£ç 之å‰è¿è¡Œæ¤é—®é¢˜çš„代ç :
dist<-c('att1','att2','att3','att4','att5','att6')
p1<-c('att1','att5','att2')
p2<-c('att5','att1','att4')
p3<-c('att3','att4','att2')
p4<-c('att1','att2','att3')
p5<-c('att6')
p32<-c('att35','att34','att32')
ç”案 1 :(得分:1)
å·²ç»æ供的ç”案是完美的,但å¦ä¸€ç§æ–¹æ³•å¯èƒ½å¦‚下:
dist<-c('att1','att2','att3','att4','att5','att6')
p1<-c('att1','att5','att2')
p2<-c('att5','att1','att4')
p3<-c('att3','att4','att2')
p4<-c('att1','att2','att3')
p5<-c('att6')
library(qdapTools)
library(data.table)
attribute_matrix <- mtabulate(list(p1, p2, p3, p4, p5))
minimal_sets <- function(superset, subsets_matrix, p){
setDT(subsets_matrix)
# removing the columns that are not in the superset
updated_sub_matr <- subsets_matrix[, which(names(subsets_matrix) %in% superset), with = F]
# initializing counter for iterations and the subset selected
subset_selected <- integer(0)
counter <- p
## Loop until either we ran out of iterations counter = 0 or we found the solution
while (counter > 0 & length(superset) > 0){
## find the row with the most matches with the superset we want to achieve
max_index <- which.max(rowSums(updated_sub_matr))
## remove from the superset the entries that match that line and from the subsets_matrix those columns as they dont contribute anymore
superset <- superset[which(updated_sub_matr[max_index, ] == 0)]
updated_sub_matr <- updated_sub_matr[, - which(updated_sub_matr[max_index, ] != 0), with = F]
counter <- counter - 1
subset_selected <- c(subset_selected, max_index)
}
if (length(superset) > 0){
print(paste0("No solution found, there are(is) ", length(superset), " element(s) left ", paste(superset, collapse = "-")))
} else {
print(paste0("Found a solution after ", p - counter, " iterations"))
}
print(paste0("Selected the following subsets: ", paste(subset_selected, collapse = "-")))
}
在æ¤åŠŸèƒ½ä¸ï¼Œæ‚¨è¾“入您的超集(在本例ä¸ä¸ºdist),您è¦æ£€æŸ¥çš„attribute_matrix和数å—p,它会输出找到的最佳解决方案以åŠè¿ä»£ã€‚
> minimal_sets(dist, attribute_matrix, 1)
[1] "No solution found, there are(is) 3 element(s) left att3-att4-att6"
[1] "Selected the following subsets: 1"
> minimal_sets(dist, attribute_matrix, 3)
[1] "Found a solution after 3 iterations"
[1] "Selected the following subsets: 1-3-5"
> minimal_sets(dist, attribute_matrix, 5)
[1] "Found a solution after 3 iterations"
[1] "Selected the following subsets: 1-3-5