我正在尝试使用vegan
包来分析一些社区数据。我的数据格式错误,正在寻找改变格式的方法。
我所拥有的是这样的:
Habitat Species Abundance
1 A 3
2 B 2
3 C 1
1 D 5
2 A 8
3 F 4
我认为我需要的是:
Habitat Species A Species B Species C Species D Species D
1 3 0 0 5 0
2 8 ...... etc
3 0
或者vegan
可以采用其他格式吗?我试图计算栖息地之间物种组成的相似性。
答案 0 :(得分:1)
labdsv
包中的matrify()
函数完全 用于社区分析。
采用三列形式的data.frame(sample.id,taxon,abundance)并将其转换为完整的矩阵形式,然后将其导出为具有相应row.names和列名称的data.frame。
换句话说,它会将您的数据从长格式转换为宽格式,以便每个row
代表样本(在您的情况下"栖息地&#34 ;;有时这会是一个"情节"),每个column
代表种,每个cell
显示给定细胞的丰度& #39;在给定细胞的栖息地(column
)中的物种(row
)。
示例强>:
dat <- data.frame(Habitat = c('Hab1','Hab1','Hab2','Hab2','Hab2','Hab3','Hab3'),
Species = c('Sp1','Sp2','Sp1','Sp3','Sp4','Sp2','Sp3'),
Abundance = c(1,2,1,3,2,2,1))
print(dat)
Habitat Species Abundance
1 Hab1 Sp1 1
2 Hab1 Sp2 2
3 Hab2 Sp1 1
4 Hab2 Sp3 3
5 Hab2 Sp4 2
6 Hab3 Sp2 2
7 Hab3 Sp3 1
library(labdsv)
matrify(dat)
Sp1 Sp2 Sp3 Sp4
Hab1 1 2 0 0
Hab2 1 0 3 2
Hab3 0 2 1 0
多年前我重写了matrify
,以便它可以处理 longitudinal社区数据
matrify2()
函数通过复制绘图(或栖息地)行标记并添加{{}>为每个绘图 - 年组合创建行(即同一绘图的重新采样行) {1}}专栏。以下是代码:
Year
示例:
#Create data.frame with PLOT, YEAR, and ABUNDANCE for each SPEC:
#Creates function that can sort the data.frame output by:
#Columns = individual SPECS, #Rows = plot by Year
#Note: Code modified from matrify() function from labdsv package (v. 1.6-1)
matrify2 <- function(data) {
#Data must have columns: plot, SPEC, abundance measure,Year
if (ncol(data) != 4)
stop("data frame must have four column format")
plt <- factor(data[, 1])
spc <- factor(data[, 2])
abu <- data[, 3]
yrs <- factor(data[, 4])
plt.codes <- sort(levels(factor(plt))) ##object with sorted plot numbers
spc.codes <- levels(factor(spc)) ##object with sorted SPEC names
yrs.codes <- sort(levels(factor(yrs))) ##object with sorted sampling Years
taxa <- matrix(0, nrow = length(plt.codes)*length(yrs.codes), ncol = length(spc.codes)) ##Create empty matrix with proper dimensions (unique(plotxYear) by # of SPEC)
plt.list <- rep(plt.codes,length(yrs.codes)) ##Create a list of all the plot numbers (in order of input data) to add as an ID column at end of function
yrs.list <- rep(yrs.codes,each=length(plt.codes)) ##Create a list of all the Year numbers (in order of input data) to add as an ID column at end of function
col <- match(spc, spc.codes) ##object that determines the alphabetical order ranking of each SPEC in the spc.code list
row.plt <- match(plt, plt.codes) ##object that determines the rank order ranking of each plot of the input data in the plt.code list
row.yrs <- match(yrs,yrs.codes) ##object that determines the rank order ranking of each Year of the input data in the yrs.code list
for (i in 1:length(abu)) {
row <- (row.plt[i])+length(plt.codes)*(row.yrs[i]-1) ##Determine row number by assuming each row represents a specific plot & year in an object of rep(plot,each=Year)
if(!is.na(abu[i])) { ##ONly use value if !is.na .. [ignore all is.NA values]
taxa[row, col[i]] <- sum(taxa[row, col[i]], abu[i]) ##Add abundance measure of row i to the proper SPEC column and plot/Year row. Sum across all identical individuals.
}
}
taxa <- data.frame(taxa) ##Convert to data.frame for easier manipulation
taxa <- cbind(plt.list,yrs.list,taxa) ##Add ID columns for plot and Year to each row already representing the abundance of Each SPEC of that given plot/Year.
names(taxa) <- c('Plot','Year',spc.codes)
taxa
}
此外,仅供参考,您应该根据dat.y <- data.frame(Habitat = c('Hab1','Hab1','Hab2','Hab2','Hab2','Hab3','Hab3','Hab1','Hab1','Hab2','Hab2','Hab2','Hab3','Hab3'),
Species = c('Sp1','Sp2','Sp1','Sp3','Sp4','Sp2','Sp3','Sp1','Sp2','Sp1','Sp3','Sp4','Sp2','Sp3'),
Abundance = c(1,2,1,3,2,2,1,1,2,1,3,2,2,1),
Year = c(1,1,1,1,1,1,1,2,2,2,2,2,2,2))
print(dat.y)
Habitat Species Abundance Year
1 Hab1 Sp1 1 1
2 Hab1 Sp2 2 1
3 Hab2 Sp1 1 1
4 Hab2 Sp3 3 1
5 Hab2 Sp4 2 1
6 Hab3 Sp2 2 1
7 Hab3 Sp3 1 1
8 Hab1 Sp1 1 2
9 Hab1 Sp2 2 2
10 Hab2 Sp1 1 2
11 Hab2 Sp3 3 2
12 Hab2 Sp4 2 2
13 Hab3 Sp2 2 2
14 Hab3 Sp3 1 2
matrify2(dat.y)
Plot Year Sp1 Sp2 Sp3 Sp4
1 Hab1 1 1 2 0 0
2 Hab2 1 1 0 3 2
3 Hab3 1 0 2 1 0
4 Hab1 2 1 2 0 0
5 Hab2 2 1 0 3 2
6 Hab3 2 0 2 1 0
documentation了解labdsv
:
vegan
软件包与labdsv
软件包一起提供了大多数标准的描述性社区分析工具。
答案 1 :(得分:0)
您可能希望spread
您的数据。例如:
library(tidyr)
mydata %>%
spread(Species, Abundance)
答案 2 :(得分:0)
我就是这样,使用dcast
:
cc=data.frame(habitat=c(1,2,3,1,2,3),species=c('a','e','a','e','g','a'), abundance=sample(1:10000,6))
。 输出如下所示(忽略第一列,因为它是由R中的输出操作创建的自动索引。重要的是列):
> cc > habitat species abundance > 1 1 a 7814 > 2 2 e 7801 > 3 3 a 9510 > 4 1 e 7443 > 5 2 g 2160 > 6 3 a 4026 >
m=melt(cc, id.vars=c("habitat","species"))
。输出:habitat species variable value 1 1 a abundance 7814 2 2 e abundance 7801 3 3 a abundance 9510 4 1 e abundance 7443 5 2 g abundance 2160 6 3 a abundance 4026
dcast(m,habitat~species,fun.aggregate=mean)
,产生:habitat a e g 1 1 7814 7443 NaN 2 2 NaN 7801 2160 3 3 6768 NaN NaN
有关重塑 here的更多信息。
Kf个