将矢量拆分为给定长度的所有组合并计算摘要统计数据

时间:2018-03-27 20:31:19

标签: r combinations

我有一个5个值的向量:

data <- c(42.3, 51.5, 53.7, 53.1, 50.7)

我想将这些值分成一个长度为2的向量和一个长度为3的向量。此外,我想创建这样的长度3和长度2向量的每种可能的组合(在这种情况下,10种方式)。

以下是一个这样的组合的示例,其中每个值都在&#34;数据&#34;每列代表一次:

enter image description here

到目前为止,我有这段代码来创建长度为3的所有组合:

table1 <- combn(data, 3)

这给了我第一个表,输出结果为:

> table1
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 42.3 42.3 42.3 42.3 42.3 42.3 51.5 51.5 51.5  53.7
[2,] 51.5 51.5 51.5 53.7 53.7 53.1 53.7 53.7 53.1  53.1
[3,] 53.7 53.1 50.7 53.1 50.7 50.7 53.1 50.7 50.7  50.7

我的计划是通过确定原始向量(&#34;数据&#34;)与&#34; table1&#34;中的每一列之间的差异来创建第二个表,以获得相应的长度为2的向量。但是,我无法弄清楚如何这样做。

在我完成之后,我计划按列计算数据摘要(平均值,sd等),然后比较两者。

问题是summary(table1)有效,但sd(table1)没有。我希望我的输出看起来像这样:

> summary(table1)
       V1              V2              V3              V4             V5             V6             V7              V8       
 Min.   :42.30   Min.   :42.30   Min.   :42.30   Min.   :42.3   Min.   :42.3   Min.   :42.3   Min.   :51.50   Min.   :50.70  
 1st Qu.:46.90   1st Qu.:46.90   1st Qu.:46.50   1st Qu.:47.7   1st Qu.:46.5   1st Qu.:46.5   1st Qu.:52.30   1st Qu.:51.10  
 Median :51.50   Median :51.50   Median :50.70   Median :53.1   Median :50.7   Median :50.7   Median :53.10   Median :51.50  
 Mean   :49.17   Mean   :48.97   Mean   :48.17   Mean   :49.7   Mean   :48.9   Mean   :48.7   Mean   :52.77   Mean   :51.97  
 3rd Qu.:52.60   3rd Qu.:52.30   3rd Qu.:51.10   3rd Qu.:53.4   3rd Qu.:52.2   3rd Qu.:51.9   3rd Qu.:53.40   3rd Qu.:52.60  
 Max.   :53.70   Max.   :53.10   Max.   :51.50   Max.   :53.7   Max.   :53.7   Max.   :53.1   Max.   :53.70   Max.   :53.70  
       V9             V10      
 Min.   :50.70   Min.   :50.7  
 1st Qu.:51.10   1st Qu.:51.9  
 Median :51.50   Median :53.1  
 Mean   :51.77   Mean   :52.5  
 3rd Qu.:52.30   3rd Qu.:53.4  
 Max.   :53.10   Max.   :53.7  

不喜欢这个

> sd(table1)
[1] 4.193394

感谢任何帮助,谢谢

2 个答案:

答案 0 :(得分:1)

我用代码修复了前半部分:

&#13;
&#13;
mnumber <- function(input){
  return(setdiff(data, input))
}

table2 <- apply(table1, 2, mnumber)
&#13;
&#13;
&#13;

这给了我输出

&#13;
&#13;
> table2
     [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
[1,] 53.1 53.7 53.7 51.5 51.5 51.5 42.3 42.3 42.3  42.3
[2,] 50.7 50.7 53.1 50.7 53.1 53.7 50.7 53.1 53.7  51.5
&#13;
&#13;
&#13;

现在我只需要分析数据。

答案 1 :(得分:0)

对向量的索引使用combn并对每个组合应用一个函数:

i <- seq_along(data)
l <- combn(i, 3, FUN = function(cmb){
  lapply(list(data[cmb], data[setdiff(i, cmb)]), function(v){
    c(summary(v), sd = sd(v))}
  )
}
, simplify = FALSE)
l[1]
[[1]]
[[1]][[1]]
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.        sd 
42.300000 46.900000 51.500000 49.166667 52.600000 53.700000  6.047589 

[[1]][[2]]
     Min.   1st Qu.    Median      Mean   3rd Qu.      Max.        sd 
50.700000 51.300000 51.900000 51.900000 52.500000 53.100000  1.697056