我有一个庞大的xlsx文件数据集:Data_203001,Dat_203002 .... 我对它们做了一些分析(每次使用单个文件),我想逐个实现整个集合,然后用Results_203001,Results_203001保存它。 所以客人我正在寻找一些指导,如何将变量分配给路径名,然后使用相同的变量保存结果分配给结果名称,并在循环中为整个集合执行。
感谢您的帮助,我现在很喜欢R,所以我感谢任何帮助。
答案 0 :(得分:2)
这样的事情可以解决问题:
# list all xslx data
all_files <- list.files(path = "path_to_data",
pattern = "Data_\\d+\\.xlsx$",
full.names = TRUE)
# process each file:
lapply(X = all_files,
FUN = function(path) {
# read your data
df <- openxlsx::read.xlsx(path)
# do your transformation
df_out <- some_transformations_to_your_data(df)
# replace data with result to get new filename:
path_out <- sub(pattern = "Data", replacement = "Results", x = path_out)
# write result to new filename:
openxlsx::write.xlsx(x = df_out, file = path_out)
})
如果您需要注释中所述的更多输出,请执行以下操作:
for (transformation_type in c("epi", "miss", "algor")) {
openxlsx::write.xlsx(
x = do.call(what = paste0("transformation_", transformation_type), args = list(df = df)),
file = sub(pattern = "Data", replacement = paste0(transformation_type, "_Results"), x = path_out)
)
}
这样整个表达式变成:
# list all xslx data
all_files <- list.files(path = "path_to_data",
pattern = "Data_\\d+\\.xlsx$",
full.names = TRUE)
# process each file:
lapply(X = all_files,
FUN = function(path) {
# read your data
df <- openxlsx::read.xlsx(path)
# do your transformation
for (transformation_type in c("epi", "miss", "algor")) {
openxlsx::write.xlsx(
x = do.call(what = paste0("transformation_", transformation_type), args = list(df = df)),
file = sub(pattern = "Data", replacement = paste0(transformation_type, "_Results"), x = path_out)
)
}
})
如果你有转换命名的函数,即函数transformation_epi()
产生epi输出等,这是有效的。
答案 1 :(得分:0)
您可以使用xlsx
扩展名
setwd(".../folder_with_excel")
file.list <- list.files(pattern='*.xlsx')
然后你可以创建一个读取文件的函数loc
。您可以确定要阅读的工作表(&#34; A&#34;在这种情况下)和您要选择的列。此外,创建一个标识要导入的文件的变量。
loc = lapply(file.list, function(i){
x = read_xlsx(i, sheet= "A")
# Get the columns
x = x[, c("col1", "col2", "col3")]
# Add a column to say which file they're from
x$file = i
# Return data
x
})
它将返回数据框列表。您可以使用rbindlist
将它们合并到一个数据框中。
# Transform the list into a data frame
all_excel_df = rbindlist(loc)
答案 2 :(得分:-1)
您可以使用for
功能将部分文件名粘贴在一起,如下所示:
walk
抱歉笨拙的&#34;类似代码的&#34;上面的结构,但你没有提供过程转换的任何信息&#34;数据_&#34;到&#34;结果_&#34;
但是,purrr
循环在R中较慢,因此下一步可能是使用{{1}}形式{{1}}包。
答案 3 :(得分:-1)
我会使用dir()
函数来确定文件夹/目录中文件的名称。
library(readxl) # reading
library(WriteXLS) # writing
for(i in dir("data_files_folder")){
cat(i, "\n")
d <- read_excel(path = paste0("data_files_folder/", i))
# something done with it
WriteXLS(d, ExcelFileName = paste0("keep_files_here/", i) )
}