从数据帧R手动构建SIMPER对比矩阵

时间:2017-04-25 06:55:26

标签: r matrix vegan multi-dimensional-scaling

我正在使用simper包中的vegan函数。简而言之,simper比较一组组,并计算哪些变量对它们的不相似性贡献最大,以及在给出累积贡献的名为cusum的列中计算多少。输出是每个组间对比度及其结果的嵌套列表。例如

library(vegan)
library(data.table)
library(tidyr)

data(dune)
data(dune.env)
sim <- with(dune.env, simper(dune, Management))
simsum<-summary(sim)

#(short version of output)

$SF_BF
             average          sd     ratio       ava       avb     cumsum
Agrostol 0.061373875 0.034193273 1.7949108 4.6666667 0.0000000 0.09824271
Alopgeni 0.052667124 0.036475863 1.4438897 4.3333333 0.6666667 0.18254830
$SF_HF
             average          sd     ratio       ava avb     cumsum
Agrostol 0.047380081 0.031272715 1.5150613 4.6666667 1.4 0.08350879
Alopgeni 0.046433015 0.032896891 1.4114712 4.3333333 1.6 0.16534834
$SF_NM
             average          sd     ratio       ava       avb    cumsum
Poatriv  0.078284148 0.040947182 1.9118324 4.6666667 0.0000000 0.1013601
Alopgeni 0.071219425 0.046958337 1.5166513 4.3333333 0.0000000 0.1935731

由此,我感兴趣1)每个嵌套列表的名称(即哪些组正在对比),2)rownames(即哪些变量有助于相异性),以及3)cusum列(即他们贡献了多少。)

我想把它变成一个对比矩阵,显示每个组间对比的前三个贡献变量,这样它更容易阅读,并且不会占用太多空间。这是我在excel中做的一个例子:

enter image description here

我怀疑这会很棘手,但这是我到目前为止所做的:

top3<-lapply(simsum, `[`,1:3,)#get top 3 contributors
cuss<-lapply(top3, `[`,6)#get last column

rows<-lapply(top3, rownames)#get names from list
rows2<-lapply(cuss, cumsum)#get values from list


rowsdf<-do.call(rbind, lapply(rows, data.frame, stringsAsFactors=FALSE))#names into df

cusumdf<-do.call(rbind, lapply(rows2, data.frame, stringsAsFactors=FALSE))#values into df

simperdf<-cbind(rowsdf,cusumdf) #combine into one df

colnames(simperdf)<-c('name','cusum') #change colnames

setDT(simperdf, keep.rownames = TRUE)[]#convert rownames to a column

simperdf<-separate(data = simperdf, col = rn, into = c("left", "right"), sep = "\\_")#seperate contrasts names
simperdf<-separate(data = simperdf, col = right, into = c("right", "delete"), sep = "\\.")#separate numbers
simperdf$delete<-NULL#delete number column

这给出了这个整洁的小数据帧:

 left right     name      cusum
 1:   SF    BF Agrostol 0.09824271
 2:   SF    BF Alopgeni 0.28079100
 3:   SF    BF Lolipere 0.54036058
 4:   SF    HF Agrostol 0.08350879
 5:   SF    HF Alopgeni 0.24885713
 6:   SF    HF Lolipere 0.48820643
 7:   SF    NM  Poatriv 0.10136013
 8:   SF    NM Alopgeni 0.29493318
 9:   SF    NM Agrostol 0.56167145
10:   BF    HF Rumeacet 0.08163219
11:   BF    HF  Poatriv 0.23357016
12:   BF    HF Planlanc 0.45275349
13:   BF    NM Lolipere 0.12427183
14:   BF    NM  Poatriv 0.32348443
15:   BF    NM  Poaprat 0.59466001
16:   HF    NM  Poatriv 0.09913221
17:   HF    NM Lolipere 0.27381681
18:   HF    NM Rumeacet 0.51298871

但我不确定从哪里开始。我看到contrasts(dune.env$Management)会给出矩阵的框架:

 HF NM SF
BF  0  0  0
HF  1  0  0
NM  0  1  0
SF  0  0  1

但我不确定如何手动填充它。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:1)

这不是你想要的,但我认为这是一个正确的方向:

require(tables)
test <- data.frame(left = c("SF", "SF", "BF", "BF"), 
                   right = c("BF","BF", "SF", "SF"),
                   name = c("Agrostol", "Alopgeni","Agrostol", "Alopgeni2"),
                   cumv = c(1,2,3,4))
tabular(right * name ~  left * cumv * mean, data = test)

给出输出:

                 left     
                 BF   SF  
                 cumv cumv
 right name      mean mean
 BF    Agrostol  NaN    1 
       Alopgeni  NaN    2 
       Alopgeni2 NaN  NaN 
 SF    Agrostol    3  NaN 
       Alopgeni  NaN  NaN 
       Alopgeni2   4  NaN