我的目标是在R中编写一个将数据框作为输入并返回其“最佳平方子集”的函数。
通过最佳平方子集,我的意思是输出需要确认以下内容:
让我们举三个例子:
example1 <- structure(list(Afternoon = c(20800L, 15254L, 17426L, 4391L, 39194L
), Evening = c(21679L, 0L, 2973L, 37L, 435L), Morning = c(0L,
3726L, 0L, 0L, 0L)), .Names = c("Afternoon", "Evening", "Morning"
), row.names = c("Friday", "Monday", "Thursday", "Tuesday", "Wednesday"
), class = "data.frame")
example2 <- structure(list(Afternoon = c(1227364L, 219402L, 3L, 0L, 530891L,
153124L, 281788L), Evening = c(570618L, 167216L, 31L, 10L, 88702L,
161006L, 42L), Morning = c(0L, 121775L, 0L, 0L, 0L, 25133L, 270162L
)), .Names = c("Afternoon", "Evening", "Morning"), row.names = c("Friday",
"Monday", "Saturday", "Sunday", "Thursday", "Tuesday", "Wednesday"
), class = "data.frame")
example3 <- structure(list(Afternoon = c(20800L, 258L, 300L, 563L, 2000L
), Evening = c(21679L, 0L, 2973L, 37L, 435L), Morning = c(0L,
3726L, 0L, 0L, 0L)), .Names = c("Afternoon", "Evening", "Morning"
), row.names = c("Friday", "Monday", "Thursday", "Tuesday", "Wednesday"
), class = "data.frame")
看起来像这样:
> example1
Afternoon Evening Morning
Friday 20800 21679 0
Monday 15254 0 3726
Thursday 17426 2973 0
Tuesday 4391 37 0
Wednesday 39194 435 0
> example2
Afternoon Evening Morning
Friday 1227364 570618 0
Monday 219402 167216 121775
Saturday 3 31 0
Sunday 0 10 0
Thursday 530891 88702 0
Tuesday 153124 161006 25133
Wednesday 281788 42 270162
> example3
Afternoon Evening Morning
Friday 20800 21679 0
Monday 258 0 3726
Thursday 300 2973 0
Tuesday 563 37 0
Wednesday 2000 435 0
我正在寻找的功能应将上述三个示例分别分为以下三个示例:
> output1
Afternoon
Friday 20800
Monday 15254
Thursday 17426
Tuesday 4391
Wednesday 39194
正方形的得分/面积是5。其他任何东西都会更少。例如,选择“星期五”,“星期四下午”,则得分为4
> output2
Afternoon Evening
Friday 1227364 570618
Monday 219402 167216
Thursday 530891 88702
Tuesday 153124 161006
在这里,某人的第一个想法是选择星期一,星期二和星期三的所有时间,并得到9分。但是,Wendesday晚间的42分验证了第一个标准。在星期一和星期二的所有时间段中,得分/面积均为6
> output3
Afternoon Evening
Friday 20800 21679
在这里,我们有两个可能的正方形可以验证前两个条件:星期五下午和晚上或星期五和星期三下午。我们必须选择第一种方法,因为单元格内的总和要高于第二种情况。此规则仅在有领带的情况下适用。
答案 0 :(得分:2)
最直观的解决方案是遍历行和列的所有可能组合,并检查选定的行和列是否形成一个完整的正方形,如果是,则检查是否形成最大的正方形。这种方法的潜在问题是,如果您有许多列和行,则需要花费一定的时间才能完成,这不是最佳选择。当列和行的数量不是大于12时,我对这里出现的问题的回答就可以很好地工作(在具有16GB RAM,2.7 GHz CPU和Windows 10pro 64位,R版本3.5.1的PC上)。 >
#library(gtools)
find_best_square <- function(x, thresh = 2000){
# x <- example1
x[x<thresh] <- 0
# for larger datasets only: removing lonely cells
if (ncol(x) > 7 | nrow(x)> 7){
for (i in 1:nrow(x)){
for (j in 1:ncol(x)){
if((colSums(x[,j,drop=F]) == x[i,j]) & (rowSums(x[i,,drop=F])==x[i,j])) x[i, j] <- 0L
}
}
}
# remove columns with no data
is_colZero <- colSums(x==0)== nrow(x)
if(any(is_colZero)) print(paste('this column is empty and removed: ', which(is_colZero)))
x <- x[,!is_colZero]
# remove rows with no data
is_rowZero <- rowSums(x==0)==ncol(x)
if(any(is_rowZero)) print(paste('this row is empty and removed: ', which(is_rowZero)))
x <- x[!is_rowZero,]
n <- ncol(x)
m <- nrow(x)
max_size <- 0L
max_sum <- 0L
jump_i <- 0L
jump_j <- 0L
for (i in n:1){ # cols
# all possible combination
next_max <- m * (i-1)
if(max_size!=0 & next_max < max_size & i * m < max_size) {
jump_i <- jump_i + 1
next()
}
comb_col <- combinations(n,i)
for (k in 1:nrow(comb_col)){
col <- as.integer(comb_col[k,])
for(j in m:1){ # rows
if (i*j < max_size ) {
jump_j <- jump_j +1
next()
}
comb_row <- combinations(m,j)
for (l in 1:nrow(comb_row)){
row <- as.integer(comb_row[l,])
y <- x[row, col, drop=F]
if(all(y > 0) & max_size <= length(row)*length(col)){
if(max_size == length(row)*length(col)){
if(sum(y) > max_sum){
max_size <- length(row) * length(col)
max_cols <- col
max_rows <- row
max_sum <- sum(y)}
} else {
max_size <- length(row) * length(col)
max_cols <- col
max_rows <- row
max_sum <- sum(y)
}
}
}
}
}
}
return(x[max_rows,max_cols, drop=F])
}
希望这对您有用,如有任何问题,请给我发送电子邮件。