如何在R

时间:2017-11-05 19:29:35

标签: r loops lapply

数据帧:

mydata<-structure(list(ParkName = c("SEP", "CSSP", 
                    "SEP", "ONF", "SEP", 
                    "ONF", "SEP", 
                    "CSSP", "ONF", 
                    "SEP", "CSSP", 
                    "PPRSP", "PPRSP", 
                    "SEP", "ONF", 
                    "PPRSP", "ONF", 
                    "SEP", "SEP", 
                    "ONF"), 
       Year = c(2001, 2005, 1998,2011, 1991, 1991, 1991, 1991, 1991, 1992, 1992, 1992, 1992, 1992,
                                      1992, 1992, 1992, 1993, 1994, 1994), 
       LatinName = c("Mola mola", "Clarias batrachus", "Lithobates catesbeianus", "Rana catesbeiana", "Rana catesbeiana", 
                     "Rana yellowis", "Rana catesbeiana", "Solenopsis sp1","Rana catesbeiana", "Rana catesbeiana",
                     "Pratensis", "Rana catesbeiana",  "Rana catesbeiana", "sp2", "Orchidaceae",
                     "Rana catesbeiana","Formica", "Rana catesbeiana", "Rana catesbeiana", "sp2"), 
       NumTotal = c(1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,1, 100, 2, 1, 2)), Names = c("ParkName", "Year", "LatinName", 
                                                                                                                  "NumTotal"),
  row.names = c(NA, -20L), class = c("tbl_df", "tbl",  "data.frame"))

该数据集代表了多年来不同公园中不同物种的丰富程度。请记住,这只是一个示例数据集,真正的数据集相当大。我基本上想要对这些数据做的是每年记录数据的物种X公园矩阵,然后使用“素食”包计算每年每个公园的多样性指数。

在社区的帮助下,我设法根据每年创建了一个数据框列表。然后我提取了数据帧并将其转换为Species X park矩阵。然后我设法获得了特定年份每个公园的多样性价值。以下是我使用的代码:

library(vegan)
dfList <- split(mydata, mydata$Year) #obtain dataframes for every year 
x<-data.frame(dfList[1]) #select dataframe from certain year
x2<-xtabs(x$X1991.NumTotal~x$X1991.ParkName+x$X1991.LatinName, 
data=x)#convert selected dataframe into species X site matrix
exp(diversity(x2, index = "shannon")) #extract diversity values

我如何运行一个循环来基本上完成我一年所做的事情,并且这一年都做了,并最终得到每年每个公园的多样性值列表?我运行循环时遇到的问题是,这是一个非常不平衡的数据集,因此长度不会最终相互匹配。

2 个答案:

答案 0 :(得分:1)

一个简单的lapply会做你想做的事。

result <- lapply(dfList, function(x){
    x2 <- xtabs(NumTotal ~ ParkName + LatinName, data = x)
    exp(diversity(x2, index = "shannon")) #extract diversity values
})
result

答案 1 :(得分:1)

使用base R

do.call(rbind, by(mydata, mydata$Year, function(d){
  xt <- xtabs(NumTotal ~ ParkName + LatinName, data = d)
  data.frame(year = d$Year[1], park = dimnames(xt)[[1]], div = exp(diversity(xt)))}))

#            year  park      div
# 1991.CSSP  1991  CSSP 1.000000
# 1991.ONF   1991   ONF 2.000000
# 1991.SEP   1991   SEP 1.000000
# 1992.CSSP  1992  CSSP 1.000000
# 1992.ONF   1992   ONF 1.057118
# 1992.PPRSP 1992 PPRSP 1.000000
# 1992.SEP   1992   SEP 2.000000
# 1993       1993   SEP 1.000000
# 1994.ONF   1994   ONF 1.000000
# 1994.SEP   1994   SEP 1.000000
# 1998       1998   SEP 1.000000
# 2001       2001   SEP 1.000000
# 2005       2005  CSSP 1.000000
# 2011       2011   ONF 1.000000

使用data.table

library(data.table)
mydata[ , {xt <- xtabs(NumTotal ~ ParkName + LatinName, data = .SD)
  .(park =  dimnames(xt)[[1]], div = exp(diversity(xt)))}, by = Year]

#     Year  park      div
#  1: 2001   SEP 1.000000
#  2: 2005  CSSP 1.000000
#  3: 1998   SEP 1.000000
#  4: 2011   ONF 1.000000
#  5: 1991  CSSP 1.000000
#  6: 1991   ONF 2.000000
#  7: 1991   SEP 1.000000
#  8: 1992  CSSP 1.000000
#  9: 1992   ONF 1.057118
# 10: 1992 PPRSP 1.000000
# 11: 1992   SEP 2.000000
# 12: 1993   SEP 1.000000
# 13: 1994   ONF 1.000000
# 14: 1994   SEP 1.000000

请注意,by会保留组内的行顺序,以及组之间的顺序。