将社区数据转换为素食包的宽格式

时间:2018-06-05 03:02:16

标签: r dataframe transformation vegan

我正在尝试使用vegan包来分析一些社区数据。我的数据格式错误,正在寻找改变格式的方法。 我所拥有的是这样的:

Habitat          Species        Abundance
1                  A                3
2                  B                2
3                  C                1
1                  D                5
2                  A                8
3                  F                4

我认为我需要的是:

Habitat      Species A       Species B       Species C    Species D    Species D
1                3               0              0              5          0
2                8               ...... etc
3                0

或者vegan可以采用其他格式吗?我试图计算栖息地之间物种组成的相似性。

3 个答案:

答案 0 :(得分:1)

labdsv包中的matrify()函数完全 用于社区分析。

  

采用三列形式的data.frame(sample.id,taxon,abundance)并将其转换为完整的矩阵形式,然后将其导出为具有相应row.names和列名称的data.frame。

换句话说,它会将您的数据从长格式转换为宽格式,以便每个row代表样本(在您的情况下"栖息地&#34 ;;有时这会是一个"情节"),每个column代表,每个cell显示给定细胞的丰度& #39;在给定细胞的栖息地(column)中的物种(row)。

示例

dat <- data.frame(Habitat = c('Hab1','Hab1','Hab2','Hab2','Hab2','Hab3','Hab3'),
                  Species = c('Sp1','Sp2','Sp1','Sp3','Sp4','Sp2','Sp3'),
                  Abundance = c(1,2,1,3,2,2,1))

print(dat)

  Habitat Species Abundance
1    Hab1     Sp1         1
2    Hab1     Sp2         2
3    Hab2     Sp1         1
4    Hab2     Sp3         3
5    Hab2     Sp4         2
6    Hab3     Sp2         2
7    Hab3     Sp3         1

library(labdsv)
matrify(dat)

     Sp1 Sp2 Sp3 Sp4
Hab1   1   2   0   0
Hab2   1   0   3   2
Hab3   0   2   1   0

加成:

多年前我重写了matrify,以便它可以处理 longitudinal社区数据

  • 具体来说,我的matrify2()函数通过复制绘图(或栖息地)行标记并添加{{}>为每个绘图 - 年组合创建行(即同一绘图的重新采样行) {1}}专栏。

以下是代码:

Year

示例

#Create data.frame with PLOT, YEAR, and ABUNDANCE for each SPEC:

 #Creates function that can sort the data.frame output by:
   #Columns = individual SPECS, #Rows = plot by Year
   #Note: Code modified from matrify() function from labdsv package (v. 1.6-1)

 matrify2 <-  function(data) { 
   #Data must have columns: plot, SPEC, abundance measure,Year 
   if (ncol(data) != 4) 
       stop("data frame must have four column format")
   plt <- factor(data[, 1]) 
   spc <- factor(data[, 2])
   abu <- data[, 3]
   yrs <- factor(data[, 4])
   plt.codes <- sort(levels(factor(plt)))                                                     ##object with sorted plot numbers
   spc.codes <- levels(factor(spc))                                                           ##object with sorted SPEC names
   yrs.codes <- sort(levels(factor(yrs)))                                                     ##object with sorted sampling Years
   taxa <- matrix(0, nrow = length(plt.codes)*length(yrs.codes), ncol = length(spc.codes))    ##Create empty matrix with proper dimensions (unique(plotxYear) by # of SPEC)
   plt.list <- rep(plt.codes,length(yrs.codes))                                               ##Create a list of all the plot numbers (in order of input data) to add as an ID column at end of function
   yrs.list <- rep(yrs.codes,each=length(plt.codes))                                          ##Create a list of all the Year numbers (in order of input data) to add as an ID column at end of function
   col <- match(spc, spc.codes)                                                               ##object that determines the alphabetical order ranking of each SPEC in the spc.code list
   row.plt <- match(plt, plt.codes)                                                           ##object that determines the rank order ranking of each plot of the input data in the plt.code list
   row.yrs <- match(yrs,yrs.codes)                                                            ##object that determines the rank order ranking of each Year of the input data in the yrs.code list
   for (i in 1:length(abu)) {
       row <- (row.plt[i])+length(plt.codes)*(row.yrs[i]-1)                                   ##Determine row number by assuming each row represents a specific plot & year in an object of rep(plot,each=Year)
       if(!is.na(abu[i])) {                                                                   ##ONly use value if !is.na .. [ignore all is.NA values]
         taxa[row, col[i]] <- sum(taxa[row, col[i]], abu[i])                                  ##Add abundance measure of row i to the proper SPEC column and plot/Year row. Sum across all identical individuals.
       }
   }
   taxa <- data.frame(taxa)                                                                   ##Convert to data.frame for easier manipulation
   taxa <- cbind(plt.list,yrs.list,taxa)                                                      ##Add ID columns for plot and Year to each row already representing the abundance of Each SPEC of that given plot/Year.
   names(taxa) <- c('Plot','Year',spc.codes)
   taxa
 }

此外,仅供参考,您应该根据dat.y <- data.frame(Habitat = c('Hab1','Hab1','Hab2','Hab2','Hab2','Hab3','Hab3','Hab1','Hab1','Hab2','Hab2','Hab2','Hab3','Hab3'), Species = c('Sp1','Sp2','Sp1','Sp3','Sp4','Sp2','Sp3','Sp1','Sp2','Sp1','Sp3','Sp4','Sp2','Sp3'), Abundance = c(1,2,1,3,2,2,1,1,2,1,3,2,2,1), Year = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2)) print(dat.y) Habitat Species Abundance Year 1 Hab1 Sp1 1 1 2 Hab1 Sp2 2 1 3 Hab2 Sp1 1 1 4 Hab2 Sp3 3 1 5 Hab2 Sp4 2 1 6 Hab3 Sp2 2 1 7 Hab3 Sp3 1 1 8 Hab1 Sp1 1 2 9 Hab1 Sp2 2 2 10 Hab2 Sp1 1 2 11 Hab2 Sp3 3 2 12 Hab2 Sp4 2 2 13 Hab3 Sp2 2 2 14 Hab3 Sp3 1 2 matrify2(dat.y) Plot Year Sp1 Sp2 Sp3 Sp4 1 Hab1 1 1 2 0 0 2 Hab2 1 1 0 3 2 3 Hab3 1 0 2 1 0 4 Hab1 2 1 2 0 0 5 Hab2 2 1 0 3 2 6 Hab3 2 0 2 1 0 documentation了解labdsv

  

vegan软件包与labdsv软件包一起提供了大多数标准的描述性社区分析工具。

答案 1 :(得分:0)

您可能希望spread您的数据。例如:

library(tidyr)
mydata %>% 
  spread(Species, Abundance)

答案 2 :(得分:0)

我就是这样,使用dcast

  • 创建数据样本:cc=data.frame(habitat=c(1,2,3,1,2,3),species=c('a','e','a','e','g','a'), abundance=sample(1:10000,6))

输出如下所示(忽略第一列,因为它是由R中的输出操作创建的自动索引。重要的是列):

> cc  
>  habitat species abundance  
> 1       1       a      7814  
> 2       2       e      7801  
> 3       3       a      9510  
> 4       1       e      7443  
> 5       2       g      2160  
> 6       3       a      4026  
> 
  • 现在融化m=melt(cc, id.vars=c("habitat","species"))。输出:
  habitat species  variable value
1       1       a abundance  7814
2       2       e abundance  7801
3       3       a abundance  9510
4       1       e abundance  7443
5       2       g abundance  2160
6       3       a abundance  4026
  • 现在重塑dcast(m,habitat~species,fun.aggregate=mean),产生:
  habitat    a    e    g
1       1 7814 7443  NaN
2       2  NaN 7801 2160
3       3 6768  NaN  NaN

有关重塑 here的更多信息。

Kf个