说我有两个向量
upVariables<-c("up1", "up2", "up3", "up4", "up5")
downVariables<-c("down1", "down2", "down3", "down4", "down5")
这些中的每一个都将用于在另一个向量中查找数字。我正在寻找所有可能的两个比率的集合(所有可能的四个变量集合,每个向量两个),其中分子总是来自upVariables,分解器总是来自downVariables而最终集合不使用相同的变量两次。
我到目前为止
upCombos<-combn(upVariables,2)
downCombos<-combn(downVariables,2)
combos<-arrange(expand.grid(upCombos=upCombos[,1],downCombos=downCombos[,1]),upCombos)
我只是在这里使用第一个可能的组合来说明,但我想迭代所有可能的组合。这给了我:
> combos
upCombos downCombos
1 up1 down1
2 up1 down2
3 up2 down1
4 up2 down2
我想从中产生的是两套,如:
> combos[1]
upCombos downCombos
1 up1 down1
2 up2 down2
和
> combos[2]
upCombos downCombos
1 up1 down2
2 up2 down1
因此,在每种情况下,upCombos中的每个值仅使用一次,而来自downCombos的每个值仅使用一次。那有意义吗?关于如何做到这一点的任何想法?
理想情况下,我希望能够概括为从原始向量而不是2组中抽样的3组,但我很乐意现在获得2组。
**编辑 所以Jota提供了一个解决方案,它提供了4个变量的任意组内的安排(2个来自upVariables,2个来自downVariables)。我仍然没有看到我如何迭代所有可能的4个变量集。我最接近的是将Jota的建议放在两个for循环中(发现还没有R程序员)。这会返回比应有的组合少得多的组合。
n<-2
offset<-n-1
for (i in 1:(length(upVariable)-offset)){
for (j in 1:(length(downVariables)-offset)){
combos <- expand.grid(upVariables[i:(i+offset)], downVariables[j:(j+offset)])
combos <- combos[with(combos, order(Var1)), ] # use dplyr::arrange if you prefer
mat <- matrix(1:n^2, byrow = TRUE, nrow = n)
for(j in 2:nrow(mat) ) mat[j, ] <- mat[j, c(j:ncol(mat), 1:(j - 1))]
pairs<-(split(combos[c(mat), ], rep(1:n, each = n)))
collapsed<-sapply(lapply(pairs, apply, 1, paste, collapse = '_'), paste, collapse = '-')
ratioGroups<-c(ratioGroups,collapsed)
}
}
这只返回16组变量(每组有2个组合,总共32个)。虽然每组中有5个变量,但还有更多的可能性。
答案 0 :(得分:0)
您可以使用expand.grid
创建组合并准备子集
用正则表达式
upVariables<-c("up1", "up2", "up3", "up4", "up5")
downVariables<-c("down1", "down2", "down3", "down4", "down5")
DF = expand.grid(upVariables,downVariables)
DF$suffix1 = as.numeric(unlist(regmatches(DF$Var1,gregexpr("[0-9]+",DF$Var1))))
DF$suffix2 = as.numeric(unlist(regmatches(DF$Var2,gregexpr("[0-9]+",DF$Var2))))
head(DF)
# Var1 Var2 suffix1 suffix2
#1 up1 down1 1 1
#2 up2 down1 2 1
#3 up3 down1 3 1
#4 up4 down1 4 1
#5 up5 down1 5 1
#6 up1 down2 1 2
DF_Comb1 = DF[DF$suffix1==DF$suffix2,]
DF_Comb2 = DF[DF$suffix1!=DF$suffix2,]
DF_Comb1
# Var1 Var2 suffix1 suffix2
# 1 up1 down1 1 1
# 7 up2 down2 2 2
# 13 up3 down3 3 3
# 19 up4 down4 4 4
# 25 up5 down5 5 5
head(DF_Comb2)
# Var1 Var2 suffix1 suffix2
# 2 up2 down1 2 1
# 3 up3 down1 3 1
# 4 up4 down1 4 1
# 5 up5 down1 5 1
# 6 up1 down2 1 2
# 8 up3 down2 3 2
答案 1 :(得分:0)
以下是我根据评论和编辑过的问题提出的建议。
# create combos and order them according to the first variable
combos <- expand.grid(upVariables[1:2], downVariables[1:2])
combos <- combos[with(combos, order(Var1)), ] # use dplyr::arrange if you prefer
# if names are important, set them:
# names(combos) <- c("upCombos", "downCombos")
# create a matrix to use to sort combos
mat <- matrix(1:2^2, byrow = TRUE, nrow = 2)
# take some code from Carl Witthoft to shift the above matrix
# from: http://stackoverflow.com/a/24144632/640595
for(j in 2:nrow(mat) ) mat[j, ] <- mat[j, c(j:ncol(mat), 1:(j - 1))]
# use the matrix to sort combos, and then conduct the splitting
initialResult <- split(combos[c(mat), ], rep(1:2, each = 2))
$`1` Var1 Var2 1 up1 down1 4 up2 down2 $`2` Var1 Var2 3 up1 down2 2 up2 down1
要生成其余组合,我们可以迭代并替换up变量和down变量:
# use regular expressions with the stringi package to produce the rest of the combinations.
library(stringi)
# convert from factor to character for easier manipulation
initialResult <- lapply(initialResult, sapply, as.character)
# iterate through the columns of upCombos
intermediateResult <- lapply(seq_len(dim(upCombos)[2]),
function(ii) {
jj <- stri_replace_all_fixed(unlist(initialResult),
pattern = c("up1", "up2"),
replacement = c(upCombos[, ii]))
relist(jj, initialResult)})
# iterate through columns of downCombos
finalResult <- lapply(seq_len(dim(downCombos)[2]),
function(ii) {
jj <- stri_replace_all_fixed(unlist(intermediateResult),
pattern = c("down1", "down2"),
replacement = c(downCombos[, ii]), vectorize_all = FALSE)
relist(jj, intermediateResult)})
答案 2 :(得分:0)
所以我想我可能已经破解了它。我已经掠过其他问题的几个答案。有一个名为expand.grid.unique的函数here,如果将相同的向量放入expand.grid两次,它将删除重复项。还有一个here,名为expand.grid.df,我甚至不会假装理解哪个扩展了expand.grid来处理数据帧。然而,结合起来,他们做我想让他们做的事情。
upVariables<-c("up1", "up2", "up3", "up4", "up5")
downVariables<-c("down1", "down2", "down3", "down4", "down5")
ratioGroups<-data.frame(matrix(ncol=2, nrow=0))
colnames(ratioGroups)<-c("mix1","mix2")
ups<-expand.grid.unique(upVariables,upVariables)
downs<-expand.grid.unique(downVariables,downVariables)
comboList<-expand.grid.df(ups,downs)
comboList <- data.frame(lapply(comboList, as.character), stringsAsFactors=FALSE)
colnames(comboList)<-c("u1","u2","d1","d2")
在那里将一切都转换回字符串,因为一切因为某种原因而转化为因素,所以有很多人都在努力。
如果我把Jota的答案放到一个函数中:
getGroups<-function(line){
n<-2 #the number ratios being used.
combos <- expand.grid(as.character(line[1:2]), as.character(line[3:4]))
combos <- combos[with(combos, order(Var1)), ] # use dplyr::arrange if you prefer
mat <- matrix(1:n^2, byrow = TRUE, nrow = n)
for(j in 2:nrow(mat) ) mat[j, ] <- mat[j, c(j:ncol(mat), 1:(j - 1))]
pairs<-(split(combos[c(mat), ], rep(1:n, each = n)))
collapsed<-sapply(lapply(pairs, apply, 1, paste, collapse = '_'), paste, collapse = '-')
}
然后我可以使用
ratiosGroups<-as.vector(apply(comboList,1,getGroups))
返回所有可能组合的列表。我猜这仍然不是实现我更大目标的最佳方式,但它已经到了那里。