循环遍历数据帧中的不同因子级别

时间:2017-12-06 23:04:35

标签: r function loops for-loop lapply

我有一些代码从数据帧中随机抽取1到10行,随机抽样复制5次,并在每个随机样本上计算网络度量(连接)。但是,我想在我的数据框中的“site”和“method”的每个级别分别运行此代码。

如何按站点和方法拆分数据框(df),在每个子集上运行以下代码,然后将所有输出返回到包含“site”,“method”,“size”列的单个文件中(采样的行数)和“连接”?

这是我到目前为止所做的:

df <- read.table(text = "bird_sp plant_sp value site method
                  1  species_a  plant_a     1    a      m
                  2  species_a  plant_a     1    a      m
                  3  species_b  plant_b     1    a      m
                  4  species_b  plant_b     1    a      m
                  5  species_c  plant_c     1    a      m
                  6  species_a  plant_a     1    b      m
                  7  species_a  plant_a     1    b      m
                  8  species_b  plant_b     1    b      m
                  9  species_b  plant_b     1    b      m
                  10 species_c  plant_c     1    b      m
                  11 species_a  plant_a     1    a      f
                  12 species_a  plant_a     1    a      f
                  13 species_b  plant_b     1    a      f
                  14 species_b  plant_b     1    a      f
                  15 species_c  plant_c     1    a      f
                  16 species_a  plant_a     1    b      f
                  17 species_a  plant_a     1    b      f
                  18 species_b  plant_b     1    b      f
                  19 species_b  plant_b     1    b      f
                  20 species_c  plant_c     1    b      f", header = TRUE)

#make sample function
sample_fun <- function(x,size){
rows <- sample(1:nrow(x),size,replace=FALSE)
intlist <- x[rows,]
return(intlist)
}

#convert list to interaction matrix
make_mat <- function(x){
mat <- with(x,tapply(value, list(plant_sp, bird_sp), sum))
mat[is.na(mat)] <- 0
return(mat)
}

#create vector with required sample size and replication
size_vector <- rep(1:10,5)

#use vector to generate list of interactions
samples_Data <- lapply(size_vector, function(x) sample_fun(df,x))

output <- lapply(samples_Data, function(x)
make_mat(x))

library(bipartite)

#calculate connectance on each element (matrix) in output list
#ignore warnings
metrics <- lapply(output, networklevel, index=c("connectance"))
met <- data.frame(unlist(metrics))
names(met) <- names(metrics[[1]])

#Add number of interactions sampled
met$size <- size_vector

1 个答案:

答案 0 :(得分:0)

You can split the dataset by site and method with the following command.

df_split <- split(df, paste0(df$site, df$method))

Afterwards you can apply a function to each subset with lapply, i.e.:

lapply(df_split, FUN = nrow)

To get your output you can do, i.e.:

result <- unique(df[, c("site", "method")])
result <- result[order(result$site, result$method),] # !! SEE BELOW
result$rows <- lapply(df_split, FUN = nrow)
result
    site method rows
 11    a      f    5
 1     a      m    5
 16    b      f    5
 6     b      m    5

Be sure to do the order command!! Split seems to automatically order the subsets alphabetically.

To generate your variable just put all your code from above into a function and run it on each subset the same way as the nrow function seen above.