Question

我确实有两组数据：丰度数据和环境数据，需要在PCA中“链接”它们或“覆盖”它们：

我想在 R 中进行 PCA ，该操作将个体纤毛虫作为个体，并将环境参数作为变量。我所拥有的是两个不同的数据框。 “ 丰度”，其中给出了采样地点不同物种的丰度， “环境”，其中给出了采样地点的环境参数。因此，我在eacht数据框中有一个共同的参数：站点！如果我进行PCA，我会得到以地点为个体，环境参数为变量的地块或以物种为个体而地点作为变量的地块。我需要的是按参数位置链接数据集，以便我可以将纤毛虫作为个体进行PCA，并将环境参数作为变量。因此，我将需要以某种方式在公共参数站点上链接两个PCA /数据帧。到目前为止，我所做的工作-我做了两个不同的PCA，并记住了PCA1的个体（纤毛物种）的坐标和PCA2的变量（环境参数）的坐标，并对其进行了双重处理->该图正是我所需要的，但是它仍然可以解释为PCA，因此数据帧确实由site参数链接了吗？还是只是在欺骗数据并失去了可解释性？

我尝试的另一种选择是通过加权平均值（通过现场纤毛虫的丰度加权）计算每种纤毛虫的环境参数，并在具有纤毛虫菌种和环境参数加权平均值的数据框架上进行PCA ...哪个有效，但我想我以这种方式失去了很多信息...您怎么看？

#Create random dataframe of abundance data, I am sure this can be done simpler and more elegant than this ;)
    species<-c("spec1", "spec2", "spec3", "spec 4", "spec 5", "spec 6", "spec7")
    site1<-c(2,4,19,34,3,6,9)
    site2<-c(5,8,9,12,0,1,1)
    site3<-c(23,56,7,1,1,1,2)
    site4<-c(4,6,2,8,5,1,7)
    abundance<-data.frame(species,site1,site2,site3,site4)
    rownames(abundance)<-abundance$species
    abundance<-abundance[,-1]
    #Create random dataframe of abundance data
    #environmental parameters of the sites
    X<-c("site1","site2","site3","site4")
    Temp<-c(24,24.5,23.5,25)
    Chla<-c(2.2,1.5,2.0,3.4)
    Plo<-c(1000,2000,1500,200)
    Plo2<-c(200,400,600,200)
    environment<-data.frame(X,Temp,Chla,Plo,Plo2)
    rownames(environment)<-environment$X
    environment<-environment[,-1]
    ###PCA on abundance data
    #hellinger pre-transformation of abundance data
    library(vegan)
    abu.h<-decostand(abundance,"hellinger")
    abu.h.pca<-prcomp(abu.h)
    envir.pca<-prcomp(environment,scale=TRUE)
    biplot(abu.h.pca)
    ##and now I would need to discard the sites vectors and overlay it with 
    #the environmental sites factors, due to my prof?
    #Graph of individuals 
    fviz_pca_ind(abu.h.pca) 
    ##get coordinates 
    library(factoextra)
    ind<-get_pca_ind(abu.h.pca) 
    head(ind$coord) 
    #x in biplot 
    ind<-ind$coord 
    ind<-ind[,1:2]
    ind 
    #y variables 
    # Extract the results for variables only

    vari<-get_pca_var(abu.h.pca) 
    var<-vari$coord 
    var<-var[,1:2] 
    var 
    biplot(ind, var, var.axes = TRUE)

Answer 1

我从未做过您所描述的事情，但是我知道您可以在nMDS上使用矢量叠加与环境（非生物）数据建立关联。如果您可以使用PCA做到这一点，我不确定，但是至少我的PRIMER手册中提到使用欧几里德距离的非生物数据的PCA与生物数据的nMDS非常吻合，这就是PRIMER的BEST功能如何工作的。但这不是PRIMER。

请参见vegan::envfit函数。 intro vignette简要介绍了它。 Vegan tutor covers it a bit more.

我对物种数据进行了转置，并使用了物种数据的nMDS进行了处理。

library(vegan)

species <-c ("spec1", "spec2", "spec3", "spec 4", "spec 5", "spec 6", "spec7")
site1 <- c(2,4,19,34,3,6,9)
site2 <- c(5,8,9,12,0,1,1)
site3 <- c(23,56,7,1,1,1,2)
site4 <- c(4,6,2,8,5,1,7)
abundance <- data.frame(species,site1,site2,site3,site4)
rownames(abundance) <- abundance$species
abundance <- abundance[,-1]
abundance <- t(abundance)

X <- c ("site1","site2","site3","site4")
Temp <- c(24,24.5,23.5,25)
Chla <- c(2.2,1.5,2.0,3.4)
Plo <- c(1000,2000,1500,200)
Plo2 <- c(200,400,600,200)
environment <- data.frame(X,Temp,Chla,Plo,Plo2)
rownames(environment) <- environment$X
environment <- environment[,-1]

AbEnvMDS <- metaMDS(abundance, k = 2)
AbEnvFit <- envfit(AbEnvMDS, environment)

plt <- plot(AbEnvMDS) # displays both sites (empty circles) and species (red +)
plt <- plot(AbEnvMDS, display = "species") # displays only species (red +)
plt
identify(plt, what = "species") # choose your points
plot(AbEnvFit) # overlays your environment

我可以将一个PCA变量的坐标覆盖在第二个PCA的个人坐标上，仍然可以解释结果吗？

1 个答案: