循环遍历数据框列表中的行并提取数据。 (嵌套“应用”功能)

时间:2017-06-01 21:12:25

标签: r dataframe nested-loops

我是R的新手并试图以“R”方式做事,这意味着没有for循环。我想循环遍历数据帧列表,循环遍历数据帧中的每一行,并根据条件提取数据并存储在主数据帧中。

我遇到的一些问题是访问“全局”数据帧。我不确定最好的方法(全局变量,通过引用传递)。

我创建了一个抽象示例,试图展示需要做什么:

rm(list=ls())## CLEAR WORKSPACE
assign("last.warning", NULL, envir = baseenv())## CLEAR WARNINGS

# Generate a descriptive name with name and size
generateDescriptiveName <- function(animal.row, animalList.vector){

   name <- animal.row["animal"]
   size <- animal.row["size"]

   # if in list of interest prepare name for master dataframe
   if (any(grepl(name, animalList.vector))){
     return (paste0(name, "Sz-", size))
   }

}

# Animals of interest
animalList.vector <- c("parrot", "cheetah", "elephant", "deer", "lizard")

jungleAnimals <- c("ants", "parrot", "cheetah")
jungleSizes <- c(0.1, 1, 50)
jungle.df <- data.frame(jungleAnimals, jungleSizes)


fieldAnimals <- c("elephant", "lion", "hyena")
fieldSizes <- c(1000, 100, 80)
field.df <- data.frame(fieldAnimals, fieldSizes)

forestAnimals <- c("squirrel", "deer", "lizard")
forestSizes <- c(1, 40, 0.2)
forest.df <- data.frame(forestAnimals, forestSizes)

ecosystems.list <- list(jungle.df, field.df, forest.df)

# Final master list
descriptiveAnimal.df <- data.frame(name = character(), descriptive.name = character())

# apply to all dataframes in list
lapply(ecosystems.list, function(ecosystem.df){
  names(ecosystem.df) <- c("animal", "size")
  # apply to each row in dataframe
  output <- apply(ecosystem.df, 1, function(row){generateDescriptiveName(row, animalList.vector)})
  if(!is.null(output)){
    # Add generated names to unique master list (no duplicates)
  }
})

最终结果将是:

         name        descriptive.name
1    "parrot"         "parrot Sz-0.1"
2   "cheetah"         "cheetah Sz-50"
3  "elephant"      "elephant Sz-1000"
4      "deer"            "deer Sz-40"
5    "lizard"         "lizard Sz-0.2"

1 个答案:

答案 0 :(得分:0)

我没有使用你的函数generateDescriptiveName(),因为我认为这有点太费力了。我也没有理由在apply()中使用lapply()。这是我尝试生成所需的输出。它并不完美,但我希望它有所帮助。

df_list <- lapply(ecosystems.list, function(ecosystem.df){
  names(ecosystem.df) <- c("animal", "size")
  temp <- ecosystem.df[ecosystem.df$animal %in% animalList.vector, ]
  if(nrow(temp) > 0){
  data.frame(name = temp$animal, descriptive.name = paste0(temp$animal, " Sz-", temp$size))
  }
})

do.call("rbind",df_list)