Question

我正在尝试使用vegan包来分析一些社区数据。我的数据格式错误，正在寻找改变格式的方法。我所拥有的是这样的：

Habitat          Species        Abundance
1                  A                3
2                  B                2
3                  C                1
1                  D                5
2                  A                8
3                  F                4

我认为我需要的是：

Habitat      Species A       Species B       Species C    Species D    Species D
1                3               0              0              5          0
2                8               ...... etc
3                0

或者vegan可以采用其他格式吗？我试图计算栖息地之间物种组成的相似性。

Answer 1

labdsv包中的matrify()函数完全用于社区分析。

采用三列形式的data.frame（sample.id，taxon，abundance）并将其转换为完整的矩阵形式，然后将其导出为具有相应row.names和列名称的data.frame。

换句话说，它会将您的数据从长格式转换为宽格式，以便每个row代表样本（在您的情况下＆＃34;栖息地＆＃34 ;;有时这会是一个＆＃34;情节＆＃34;），每个column代表种，每个cell显示给定细胞的丰度＆ #39;在给定细胞的栖息地（column）中的物种（row）。

示例：

dat <- data.frame(Habitat = c('Hab1','Hab1','Hab2','Hab2','Hab2','Hab3','Hab3'), Species = c('Sp1','Sp2','Sp1','Sp3','Sp4','Sp2','Sp3'), Abundance = c(1,2,1,3,2,2,1)) print(dat) Habitat Species Abundance 1 Hab1 Sp1 1 2 Hab1 Sp2 2 3 Hab2 Sp1 1 4 Hab2 Sp3 3 5 Hab2 Sp4 2 6 Hab3 Sp2 2 7 Hab3 Sp3 1 library(labdsv) matrify(dat) Sp1 Sp2 Sp3 Sp4 Hab1 1 2 0 0 Hab2 1 0 3 2 Hab3 0 2 1 0

加成：

多年前我重写了matrify，以便它可以处理 longitudinal社区数据

具体来说，我的matrify2()函数通过复制绘图（或栖息地）行标记并添加{{}>为每个绘图 - 年组合创建行（即同一绘图的重新采样行） {1}}专栏。

以下是代码：

Year

示例：

#Create data.frame with PLOT, YEAR, and ABUNDANCE for each SPEC: #Creates function that can sort the data.frame output by: #Columns = individual SPECS, #Rows = plot by Year #Note: Code modified from matrify() function from labdsv package (v. 1.6-1) matrify2 <- function(data) { #Data must have columns: plot, SPEC, abundance measure,Year if (ncol(data) != 4) stop("data frame must have four column format") plt <- factor(data[, 1]) spc <- factor(data[, 2]) abu <- data[, 3] yrs <- factor(data[, 4]) plt.codes <- sort(levels(factor(plt))) ##object with sorted plot numbers spc.codes <- levels(factor(spc)) ##object with sorted SPEC names yrs.codes <- sort(levels(factor(yrs))) ##object with sorted sampling Years taxa <- matrix(0, nrow = length(plt.codes)*length(yrs.codes), ncol = length(spc.codes)) ##Create empty matrix with proper dimensions (unique(plotxYear) by # of SPEC) plt.list <- rep(plt.codes,length(yrs.codes)) ##Create a list of all the plot numbers (in order of input data) to add as an ID column at end of function yrs.list <- rep(yrs.codes,each=length(plt.codes)) ##Create a list of all the Year numbers (in order of input data) to add as an ID column at end of function col <- match(spc, spc.codes) ##object that determines the alphabetical order ranking of each SPEC in the spc.code list row.plt <- match(plt, plt.codes) ##object that determines the rank order ranking of each plot of the input data in the plt.code list row.yrs <- match(yrs,yrs.codes) ##object that determines the rank order ranking of each Year of the input data in the yrs.code list for (i in 1:length(abu)) { row <- (row.plt[i])+length(plt.codes)*(row.yrs[i]-1) ##Determine row number by assuming each row represents a specific plot & year in an object of rep(plot,each=Year) if(!is.na(abu[i])) { ##ONly use value if !is.na .. [ignore all is.NA values] taxa[row, col[i]] <- sum(taxa[row, col[i]], abu[i]) ##Add abundance measure of row i to the proper SPEC column and plot/Year row. Sum across all identical individuals. } } taxa <- data.frame(taxa) ##Convert to data.frame for easier manipulation taxa <- cbind(plt.list,yrs.list,taxa) ##Add ID columns for plot and Year to each row already representing the abundance of Each SPEC of that given plot/Year. names(taxa) <- c('Plot','Year',spc.codes) taxa }

此外，仅供参考，您应该根据dat.y <- data.frame(Habitat = c('Hab1','Hab1','Hab2','Hab2','Hab2','Hab3','Hab3','Hab1','Hab1','Hab2','Hab2','Hab2','Hab3','Hab3'), Species = c('Sp1','Sp2','Sp1','Sp3','Sp4','Sp2','Sp3','Sp1','Sp2','Sp1','Sp3','Sp4','Sp2','Sp3'), Abundance = c(1,2,1,3,2,2,1,1,2,1,3,2,2,1), Year = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2)) print(dat.y) Habitat Species Abundance Year 1 Hab1 Sp1 1 1 2 Hab1 Sp2 2 1 3 Hab2 Sp1 1 1 4 Hab2 Sp3 3 1 5 Hab2 Sp4 2 1 6 Hab3 Sp2 2 1 7 Hab3 Sp3 1 1 8 Hab1 Sp1 1 2 9 Hab1 Sp2 2 2 10 Hab2 Sp1 1 2 11 Hab2 Sp3 3 2 12 Hab2 Sp4 2 2 13 Hab3 Sp2 2 2 14 Hab3 Sp3 1 2 matrify2(dat.y) Plot Year Sp1 Sp2 Sp3 Sp4 1 Hab1 1 1 2 0 0 2 Hab2 1 1 0 3 2 3 Hab3 1 0 2 1 0 4 Hab1 2 1 2 0 0 5 Hab2 2 1 0 3 2 6 Hab3 2 0 2 1 0 documentation了解labdsv：

vegan软件包与labdsv软件包一起提供了大多数标准的描述性社区分析工具。

Answer 2

您可能希望spread您的数据。例如：

library(tidyr)
mydata %>% 
  spread(Species, Abundance)

Answer 3

我就是这样，使用dcast：

创建数据样本：cc=data.frame(habitat=c(1,2,3,1,2,3),species=c('a','e','a','e','g','a'), abundance=sample(1:10000,6))。

输出如下所示（忽略第一列，因为它是由R中的输出操作创建的自动索引。重要的是列）：

> cc  
>  habitat species abundance  
> 1       1       a      7814  
> 2       2       e      7801  
> 3       3       a      9510  
> 4       1       e      7443  
> 5       2       g      2160  
> 6       3       a      4026  
>

现在融化：m=melt(cc, id.vars=c("habitat","species"))。输出：

  habitat species  variable value
1       1       a abundance  7814
2       2       e abundance  7801
3       3       a abundance  9510
4       1       e abundance  7443
5       2       g abundance  2160
6       3       a abundance  4026

现在重塑：dcast(m,habitat~species,fun.aggregate=mean)，产生：

  habitat    a    e    g
1       1 7814 7443  NaN
2       2  NaN 7801 2160
3       3 6768  NaN  NaN

有关重塑 here的更多信息。

Kf个

将社区数据转换为素食包的宽格式

3 个答案:

加成：