R

时间:2016-11-15 01:59:32

标签: r

说我有两个向量

upVariables<-c("up1", "up2", "up3", "up4", "up5")
downVariables<-c("down1", "down2", "down3", "down4", "down5")

这些中的每一个都将用于在另一个向量中查找数字。我正在寻找所有可能的两个比率的集合(所有可能的四个变量集合,每个向量两个),其中分子总是来自upVariables,分解器总是来自downVariables而最终集合不使用相同的变量两次。

我到目前为止

upCombos<-combn(upVariables,2)
downCombos<-combn(downVariables,2)
combos<-arrange(expand.grid(upCombos=upCombos[,1],downCombos=downCombos[,1]),upCombos)

我只是在这里使用第一个可能的组合来说明,但我想迭代所有可能的组合。这给了我:

> combos
  upCombos downCombos
1      up1      down1
2      up1      down2
3      up2      down1
4      up2      down2

我想从中产生的是两套,如:

> combos[1]
  upCombos downCombos
1      up1      down1
2      up2      down2

> combos[2]
  upCombos downCombos
1      up1      down2
2      up2      down1

因此,在每种情况下,upCombos中的每个值仅使用一次,而来自downCombos的每个值仅使用一次。那有意义吗?关于如何做到这一点的任何想法?

理想情况下,我希望能够概括为从原始向量而不是2组中抽样的3组,但我很乐意现在获得2组。

**编辑 所以Jota提供了一个解决方案,它提供了4个变量的任意组内的安排(2个来自upVariables,2个来自downVariables)。我仍然没有看到我如何迭代所有可能的4个变量集。我最接近的是将Jota的建议放在两个for循环中(发现还没有R程序员)。这会返回比应有的组合少得多的组合。

n<-2
offset<-n-1
for (i in 1:(length(upVariable)-offset)){
  for (j in 1:(length(downVariables)-offset)){
    combos <- expand.grid(upVariables[i:(i+offset)], downVariables[j:(j+offset)])
    combos <- combos[with(combos, order(Var1)), ]  # use dplyr::arrange if you prefer
    mat <- matrix(1:n^2, byrow = TRUE, nrow = n)
    for(j in 2:nrow(mat) ) mat[j, ] <- mat[j, c(j:ncol(mat), 1:(j - 1))]
      pairs<-(split(combos[c(mat), ], rep(1:n, each = n)))
     collapsed<-sapply(lapply(pairs, apply, 1, paste, collapse = '_'), paste, collapse = '-')
      ratioGroups<-c(ratioGroups,collapsed)
  }
}

这只返回16组变量(每组有2个组合,总共32个)。虽然每组中有5个变量,但还有更多的可能性。

3 个答案:

答案 0 :(得分:0)

您可以使用expand.grid创建组合并准备子集     用正则表达式

upVariables<-c("up1", "up2", "up3", "up4", "up5")
downVariables<-c("down1", "down2", "down3", "down4", "down5")

DF = expand.grid(upVariables,downVariables)

DF$suffix1 = as.numeric(unlist(regmatches(DF$Var1,gregexpr("[0-9]+",DF$Var1))))

DF$suffix2 = as.numeric(unlist(regmatches(DF$Var2,gregexpr("[0-9]+",DF$Var2))))

head(DF)
#  Var1  Var2 suffix1 suffix2
#1  up1 down1       1       1
#2  up2 down1       2       1
#3  up3 down1       3       1
#4  up4 down1       4       1
#5  up5 down1       5       1
#6  up1 down2       1       2



DF_Comb1 = DF[DF$suffix1==DF$suffix2,]
DF_Comb2 = DF[DF$suffix1!=DF$suffix2,]

DF_Comb1
#    Var1  Var2 suffix1 suffix2
# 1   up1 down1       1       1
# 7   up2 down2       2       2
# 13  up3 down3       3       3
# 19  up4 down4       4       4
# 25  up5 down5       5       5


head(DF_Comb2)
  # Var1  Var2 suffix1 suffix2
# 2  up2 down1       2       1
# 3  up3 down1       3       1
# 4  up4 down1       4       1
# 5  up5 down1       5       1
# 6  up1 down2       1       2
# 8  up3 down2       3       2

答案 1 :(得分:0)

以下是我根据评论和编辑过的问题提出的建议。

# create combos and order them according to the first variable
combos <- expand.grid(upVariables[1:2], downVariables[1:2])
combos <- combos[with(combos, order(Var1)), ]  # use dplyr::arrange if you prefer
# if names are important, set them:
# names(combos) <- c("upCombos", "downCombos")

# create a matrix to use to sort combos
mat <- matrix(1:2^2, byrow = TRUE, nrow = 2)
# take some code from Carl Witthoft to shift the above matrix
# from: http://stackoverflow.com/a/24144632/640595
for(j in 2:nrow(mat) ) mat[j, ] <- mat[j, c(j:ncol(mat), 1:(j - 1))]

# use the matrix to sort combos, and then conduct the splitting
initialResult <- split(combos[c(mat), ], rep(1:2, each = 2))
$`1`
  Var1  Var2
1  up1 down1
4  up2 down2

$`2`
  Var1  Var2
3  up1 down2
2  up2 down1

要生成其余组合,我们可以迭代并替换up变量和down变量:

# use regular expressions with the stringi package to produce the rest of the combinations.
library(stringi)
# convert from factor to character for easier manipulation
initialResult <- lapply(initialResult, sapply, as.character)

# iterate through the columns of upCombos
intermediateResult <- lapply(seq_len(dim(upCombos)[2]), 
    function(ii) {
        jj <- stri_replace_all_fixed(unlist(initialResult), 
            pattern = c("up1", "up2"), 
            replacement = c(upCombos[, ii]))
        relist(jj, initialResult)})

# iterate through columns of downCombos
finalResult <- lapply(seq_len(dim(downCombos)[2]), 
    function(ii) {
        jj <- stri_replace_all_fixed(unlist(intermediateResult), 
            pattern = c("down1", "down2"), 
            replacement = c(downCombos[, ii]), vectorize_all = FALSE)
        relist(jj, intermediateResult)})

答案 2 :(得分:0)

所以我想我可能已经破解了它。我已经掠过其他问题的几个答案。有一个名为expand.grid.unique的函数here,如果将相同的向量放入expand.grid两次,它将删除重复项。还有一个here,名为expand.grid.df,我甚至不会假装理解哪个扩展了expand.grid来处理数据帧。然而,结合起来,他们做我想让他们做的事情。

upVariables<-c("up1", "up2", "up3", "up4", "up5")
downVariables<-c("down1", "down2", "down3", "down4", "down5")
ratioGroups<-data.frame(matrix(ncol=2, nrow=0))
colnames(ratioGroups)<-c("mix1","mix2")

ups<-expand.grid.unique(upVariables,upVariables)
downs<-expand.grid.unique(downVariables,downVariables)
comboList<-expand.grid.df(ups,downs)
comboList <- data.frame(lapply(comboList, as.character), stringsAsFactors=FALSE)
colnames(comboList)<-c("u1","u2","d1","d2")

在那里将一切都转换回字符串,因为一切因为某种原因而转化为因素,所以有很多人都在努力。

如果我把Jota的答案放到一个函数中:

getGroups<-function(line){
  n<-2 #the number ratios being used.
  combos <- expand.grid(as.character(line[1:2]), as.character(line[3:4]))
  combos <- combos[with(combos, order(Var1)), ]  # use dplyr::arrange if you prefer
  mat <- matrix(1:n^2, byrow = TRUE, nrow = n)
  for(j in 2:nrow(mat) ) mat[j, ] <- mat[j, c(j:ncol(mat), 1:(j - 1))]
  pairs<-(split(combos[c(mat), ], rep(1:n, each = n)))
  collapsed<-sapply(lapply(pairs, apply, 1, paste, collapse = '_'), paste, collapse = '-')
}

然后我可以使用

ratiosGroups<-as.vector(apply(comboList,1,getGroups))

返回所有可能组合的列表。我猜这仍然不是实现我更大目标的最佳方式,但它已经到了那里。