我正在测试许多模型,并且想要创建一个包含输入字符串的所有可能组合的输出字符串,只要不重复字母即可。例如:
inputdata <- c("A1", "A2", "A3", "B1", "B2", "B3", "C1", "C2", "C3")
输出如下:
outputdata <- c("A1 + B1 + C1", "A2 + B1 + C1","A3 + B1 + C1", "A1 + B2 + C1", "A1 + B3 + C1", "A1 + B1 + C2", "A1 + B1 + C3", "A2 + B2 + C1", "A2 + B2 + C2", "A3 + B2 + C2", "A3 + B3 + C2", "A3 + B3 + C3")
我已经通过这段代码完成了大部分工作:
library(gtools)
dataformodel <- data.frame(combinations(9,3,inputdata))
dataformodel$x <- apply( dataformodel[, ] , 1 , paste , collapse = "+" )
dataformodel <- dataformodel[, -c(1:3)]
这里的问题是重复相同的字母,例如。 "A1 + A2 + B1"
-我的想法是计算每个字符串中的唯一字母,并删除count <3 ...但没有运气的情况(使用stingr程序包)。有什么建议吗?
答案 0 :(得分:3)
split(inputdata, substr(inputdata,1,1))
# $A
# [1] "A1" "A2" "A3"
# $B
# [1] "B1" "B2" "B3"
# $C
# [1] "C1" "C2" "C3"
如果我们对此调用expand.grid
,则可以给我们每个A
,B
和C
的组合:
head( do.call(expand.grid, split(x, substr(x,1,1))) )
# A B C
# 1 A1 B1 C1
# 2 A2 B1 C1
# 3 A3 B1 C1
# 4 A1 B2 C1
# 5 A2 B2 C1
# 6 A3 B2 C1
现在我们可以paste(..., collapse="+")
:
apply(do.call(expand.grid, split(x, substr(x,1,1))), 1, paste, collapse="+")
# [1] "A1+B1+C1" "A2+B1+C1" "A3+B1+C1" "A1+B2+C1" "A2+B2+C1" "A3+B2+C1"
# [7] "A1+B3+C1" "A2+B3+C1" "A3+B3+C1" "A1+B1+C2" "A2+B1+C2" "A3+B1+C2"
# [13] "A1+B2+C2" "A2+B2+C2" "A3+B2+C2" "A1+B3+C2" "A2+B3+C2" "A3+B3+C2"
# [19] "A1+B1+C3" "A2+B1+C3" "A3+B1+C3" "A1+B2+C3" "A2+B2+C3" "A3+B2+C3"
# [25] "A1+B3+C3" "A2+B3+C3" "A3+B3+C3"
注意:
expand.grid
可能会用完内存; apply
上的data.frame
是安全合理的几次,因为我们知道其所有输入都属于同一类答案 1 :(得分:3)
可以filter
对“ dataformodel”进行删除,以删除具有相同字母的行
dataformodel <- dataformodel[!apply(sapply(dataformodel,
sub, pattern = "\\d+", replacement = ""), 1, anyDuplicated),]
然后应用OP的代码以获取输出
head(dataformodel)
#[1] "A1+B1+C1" "A1+B1+C2" "A1+B1+C3" "A1+B2+C1" "A1+B2+C2" "A1+B2+C3"
一种更快的方法是使用RcppAlgos
library(RcppAlgos)
dataformodel <- comboGeneral(inputdata, m = 3, repetition = FALSE)