如何在多个数据帧上运行相同的代码?

时间:2019-09-27 16:05:42

标签: r dplyr

我有一个从文件夹导入的数据帧的列表,我想编写一个函数,该函数将根据数据帧的标题(取自文件名)来更改某些值。

library(dplyr)

time_geog <- function(index) {
  index = index %>%
    mutate(Quarter = quarter(as.Date(quarter_date, format = "%d/%m/%Y"),
                             with_year = TRUE),
           Quarter = paste0(substr(as.character(Quarter), 1, 4),"Q",
                            substr(as.character(Quarter), 6, 6)),
           QuarterQF = case_when(Quarter == "2018Q4" ~ "p",
                                 TRUE ~ ""))
  if(str_detect(index, "Title")) {
    index = index %>% 
      mutate(var1 = case_when(var1 == "abcd" ~ "code",
                                             TRUE ~ var1),
             var2 = case_when(var2 == "abcd" ~ "code",
                                           TRUE ~ var2),
             QF1 = case_when(var1 %in% c("value1", "value2") ~ "x",
                                   TRUE ~ ""),
             QF2 = case_when(var2 %in% c("value1", "value2") ~ "x",
                                   TRUE ~ ""))
  } else {
    index = index %>%
      mutate(var3 = case_when(var3 == "abcd" ~ "code",
                                 TRUE ~ var3),
             var4 = case_when(var4 == "abcd" ~ "code",
                                  TRUE ~ var4),
             QF1 = case_when(var3 == "value1" ~ "d", TRUE ~ "",
                                  var3 %in% c("value2", "value3") ~ "x",
                                  TRUE ~ ""))
  }
}

我已将此函数放入如下所示的for循环中,它还会读取我需要的所有文件,并根据其原始名称为其分配一个名称。

for (i in names) {
  filepath <- file.path(files, paste0(i, ".csv"))
  assign(substr(i, 10, nchar(i)), read_csv(filepath)) 
  time_geog(get(substr(i, 10, nchar(i))))
}

当我将特定文件传递给它时它起作用,但是当我运行循环时却不起作用。使用所需的标题阅读所需的文件时,我也没有任何问题。我也不希望它们随后都在同一个数据框中,如果我使用,会发生以下情况:

for (i in names) {
  filepath <- file.path(files, paste0(i, ".csv"))
  assign(substr(i, 10, nchar(i)), read_csv(filepath)) 
  i <- time_geog(get(substr(i, 10, nchar(i))))
}

任何帮助将不胜感激。我觉得我真的很接近,但是只是缺少一些重要的知识!

2 个答案:

答案 0 :(得分:0)

始终记住一次处理多个数据帧的正确方法是将它们存储到列表中,然后使用lapply或其他映射函数一次将函数应用于列表。

我没有您的数据,但是根据您提供的代码,您可以尝试:

# create an empty list (you may want to specify the length of the list if you know the total number of your files)   
df_list <- list()

# store all dataframe into the list
for (i in names) {
  filepath <- file.path(files, paste0(i, ".csv"))
  df_list[[length(df_list)+1]] <- read_csv(filepath) 
}

# apply your function to the list
df_list_new <- lapply(df_list,time_geog)

# merge the list into one master dataframe (`bind_rows()` comes from `dplyr` package)
df_master <- bind_rows(df_list_new)

答案 1 :(得分:0)

您提到了“数据帧列表” ,但是您的代码使用assign向您显示,我通常不建议这样做。如果您的镜架足够,那么您可以使用

list_of_frames <- setNames(lapply(paste0(files, ".csv"), readr::read_csv),
                           files)

(或某些文件名substr)。

您的函数希望通过与对象本身进行匹配来获取对象的名称。尽管有很多方法可以做到(例如deparse / substitute,而不是您尝试的方法),但它并非在所有情况下都有效,我建议您不要依赖它。

相反,我建议您为函数指定数据名称。也许像这样(未经测试):

time_geog <- function(index, name) {
  index = index %>%
    mutate(Quarter = quarter(as.Date(quarter_date, format = "%d/%m/%Y"),
                             with_year = TRUE),
           Quarter = paste0(substr(as.character(Quarter), 1, 4),"Q",
                            substr(as.character(Quarter), 6, 6)),
           QuarterQF = case_when(Quarter == "2018Q4" ~ "p",
                                 TRUE ~ ""))
  if(str_detect(name, "Title")) {
    index = index %>% 
      mutate(var1 = case_when(var1 == "abcd" ~ "code",
                                             TRUE ~ var1),
             var2 = case_when(var2 == "abcd" ~ "code",
                                           TRUE ~ var2),
             QF1 = case_when(var1 %in% c("value1", "value2") ~ "x",
                                   TRUE ~ ""),
             QF2 = case_when(var2 %in% c("value1", "value2") ~ "x",
                                   TRUE ~ ""))
  } else {
    index = index %>%
      mutate(var3 = case_when(var3 == "abcd" ~ "code",
                                 TRUE ~ var3),
             var4 = case_when(var4 == "abcd" ~ "code",
                                  TRUE ~ var4),
             QF1 = case_when(var3 == "value1" ~ "d", TRUE ~ "",
                                  var3 %in% c("value2", "value3") ~ "x",
                                  TRUE ~ ""))
  }
  return(index)
}
out <- Map(time_geog, list_of_frames, files)