我有一个从文件夹导入的数据帧的列表,我想编写一个函数,该函数将根据数据帧的标题(取自文件名)来更改某些值。
library(dplyr)
time_geog <- function(index) {
index = index %>%
mutate(Quarter = quarter(as.Date(quarter_date, format = "%d/%m/%Y"),
with_year = TRUE),
Quarter = paste0(substr(as.character(Quarter), 1, 4),"Q",
substr(as.character(Quarter), 6, 6)),
QuarterQF = case_when(Quarter == "2018Q4" ~ "p",
TRUE ~ ""))
if(str_detect(index, "Title")) {
index = index %>%
mutate(var1 = case_when(var1 == "abcd" ~ "code",
TRUE ~ var1),
var2 = case_when(var2 == "abcd" ~ "code",
TRUE ~ var2),
QF1 = case_when(var1 %in% c("value1", "value2") ~ "x",
TRUE ~ ""),
QF2 = case_when(var2 %in% c("value1", "value2") ~ "x",
TRUE ~ ""))
} else {
index = index %>%
mutate(var3 = case_when(var3 == "abcd" ~ "code",
TRUE ~ var3),
var4 = case_when(var4 == "abcd" ~ "code",
TRUE ~ var4),
QF1 = case_when(var3 == "value1" ~ "d", TRUE ~ "",
var3 %in% c("value2", "value3") ~ "x",
TRUE ~ ""))
}
}
我已将此函数放入如下所示的for循环中,它还会读取我需要的所有文件,并根据其原始名称为其分配一个名称。
for (i in names) {
filepath <- file.path(files, paste0(i, ".csv"))
assign(substr(i, 10, nchar(i)), read_csv(filepath))
time_geog(get(substr(i, 10, nchar(i))))
}
当我将特定文件传递给它时它起作用,但是当我运行循环时却不起作用。使用所需的标题阅读所需的文件时,我也没有任何问题。我也不希望它们随后都在同一个数据框中,如果我使用,会发生以下情况:
for (i in names) {
filepath <- file.path(files, paste0(i, ".csv"))
assign(substr(i, 10, nchar(i)), read_csv(filepath))
i <- time_geog(get(substr(i, 10, nchar(i))))
}
任何帮助将不胜感激。我觉得我真的很接近,但是只是缺少一些重要的知识!
答案 0 :(得分:0)
始终记住一次处理多个数据帧的正确方法是将它们存储到列表中,然后使用lapply
或其他映射函数一次将函数应用于列表。
我没有您的数据,但是根据您提供的代码,您可以尝试:
# create an empty list (you may want to specify the length of the list if you know the total number of your files)
df_list <- list()
# store all dataframe into the list
for (i in names) {
filepath <- file.path(files, paste0(i, ".csv"))
df_list[[length(df_list)+1]] <- read_csv(filepath)
}
# apply your function to the list
df_list_new <- lapply(df_list,time_geog)
# merge the list into one master dataframe (`bind_rows()` comes from `dplyr` package)
df_master <- bind_rows(df_list_new)
答案 1 :(得分:0)
您提到了“数据帧列表” ,但是您的代码使用assign
向您显示,我通常不建议这样做。如果您的镜架足够,那么您可以使用
list_of_frames <- setNames(lapply(paste0(files, ".csv"), readr::read_csv),
files)
(或某些文件名substr
)。
您的函数希望通过与对象本身进行匹配来获取对象的名称。尽管有很多方法可以做到(例如deparse
/ substitute
,而不是您尝试的方法),但它并非在所有情况下都有效,我建议您不要依赖它。
相反,我建议您为函数指定数据名称。也许像这样(未经测试):
time_geog <- function(index, name) {
index = index %>%
mutate(Quarter = quarter(as.Date(quarter_date, format = "%d/%m/%Y"),
with_year = TRUE),
Quarter = paste0(substr(as.character(Quarter), 1, 4),"Q",
substr(as.character(Quarter), 6, 6)),
QuarterQF = case_when(Quarter == "2018Q4" ~ "p",
TRUE ~ ""))
if(str_detect(name, "Title")) {
index = index %>%
mutate(var1 = case_when(var1 == "abcd" ~ "code",
TRUE ~ var1),
var2 = case_when(var2 == "abcd" ~ "code",
TRUE ~ var2),
QF1 = case_when(var1 %in% c("value1", "value2") ~ "x",
TRUE ~ ""),
QF2 = case_when(var2 %in% c("value1", "value2") ~ "x",
TRUE ~ ""))
} else {
index = index %>%
mutate(var3 = case_when(var3 == "abcd" ~ "code",
TRUE ~ var3),
var4 = case_when(var4 == "abcd" ~ "code",
TRUE ~ var4),
QF1 = case_when(var3 == "value1" ~ "d", TRUE ~ "",
var3 %in% c("value2", "value3") ~ "x",
TRUE ~ ""))
}
return(index)
}
out <- Map(time_geog, list_of_frames, files)