Question

我有一个包含数百行和10列的数据帧，需要找到行，其总和在一个范围内。我搜索了排列和计算以及子集求和问题，但是所提供的解决方案似乎都没有达到预期的结果。

请告诉我，是否有解决此类问题的方案。是否有函数或任何矢量化的方法来解决这些类型的迭代＆＃34;在R？

# sample dataframe
x <- data.frame(a=c("A","B","C","D"),b=c(1,2,1,1))

假设，任何接受的组合的总和为3，那么期望的结果可以是像

这样的列表

[[1]]     # combination 1
[1] 
1,2       # lists all rows used
[2]  
1,2      # lists all values use

[[2]]     # combination 2
[1]       
2,1       # lists all rows used
[2]
2,1       # lists all values used

[[3]]     # combination 3
[1]       
2,4       # lists all rows used
[2]       
2,1       # lists all values used

[[4]]     # combination 4
[1]
1,3,4     # lists all rows used
[2]       
1,1,1     # lists all values used

（＃comments：仅为解释目的而添加这些内容）

注意：

并非所有可能的组合都是必需的，并非所有值都有使用！
一行只能在给定的组合中使用ONCE（即第3行的第3次总结是没有选择！）
组合可以是sum(x[1:2,2])以及(x[1,2] x[2,2]+ .... + x[n,2]))

Answer 1

我希望我能正确理解你的问题。但是，假设我们有一些矩阵dat，我们希望总结（对于每一列）不同的行组合。我们可以使用*apply函数系列以及combn来完成此操作。

以下是我们的工作：

循环遍历矩阵的列（使用apply）
对于矩阵的每一列，总结行的唯一组合（使用lapply和apply）
我们使用combn调用

sapply

生成样本`dat`

set.seed(123)
dat <- matrix(rnorm(5 * 6), nrow = 5, ncol = 6)

循环遍历`dat`

big_list <- apply(dat, 2, FUN = function(matcol) # over the columns of dat
  lapply(sapply(1:5, FUN = function(x) combn(1:5, x)), # loop through unique combinations of rows in dat
         FUN = function(combs) 
           apply(combs, 2, #over the columns of unique combinations
                 FUN = function(rows) 
                   data.frame(
                     'rows_used' = paste(rows, collapse = ', '), 
                     'n_rows' = length(rows), 
                     'sum' = sum(matcol[rows]))))) #sum up the rows

[[1]] # column 
[[1]][[1]] #[[n_rows]][[n_comb]]
  rows_used n_rows        sum
1         1      1 -0.5604756

[[1]][[2]]
  rows_used n_rows        sum
1         2      1 -0.2301775

[[1]][[3]]
  rows_used n_rows      sum
1         3      1 1.558708

在我们浏览了每个列和每个行组合后，我们可以将list中的数据提取到data.frame。例如，假设我们对第6列的总和感兴趣：

使用结果

column <- 6
df_from_list <- do.call('rbind',
        lapply(big_list[[column]], 
                 FUN = function(x) do.call('rbind', x)))

       rows_used n_rows        sum
1              1      1 -1.6866933
2              2      1  0.8377870
3              3      1  0.1533731
4              4      1 -1.1381369
5              5      1  1.2538149

然后，我们可以使用subset函数（或dplyr::filter）来获取第6列中n行的所有组合，其中和为＆gt; = 0且＆lt; = 0.5：

subset(df_from_list, sum >= 0 & sum <= .5)

   rows_used n_rows       sum
3          3      1 0.1533731
15      4, 5      2 0.1156780
18   1, 2, 5      3 0.4049087
25   3, 4, 5      3 0.2690511

旁注

如果这种计算方法不能很好地扩展，那就不足为奇了，我确信有更有效的解决方案。我解决问题的结构导致嵌套的list结构，这意味着用户应该熟悉list中的R对象。

R：查找与指定结果

1 个答案:

生成样本`dat`

循环遍历`dat`

使用结果

旁注

R：查找与指定结果

1 个答案:

生成样本dat

循环遍历dat

使用结果

旁注

生成样本`dat`

循环遍历`dat`