R中读取和处理多个文件

时间:2018-04-20 11:23:52

标签: r

我有一个庞大的xlsx文件数据集:Data_203001,Dat_203002 ....  我对它们做了一些分析(每次使用单个文件),我想逐个实现整个集合,然后用Results_203001,Results_203001保存它。 所以客人我正在寻找一些指导,如何将变量分配给路径名,然后使用相同的变量保存结果分配给结果名称,并在循环中为整个集合执行。

感谢您的帮助,我现在很喜欢R,所以我感谢任何帮助。

4 个答案:

答案 0 :(得分:2)

这样的事情可以解决问题:

# list all xslx data
all_files <- list.files(path = "path_to_data", 
                        pattern = "Data_\\d+\\.xlsx$", 
                        full.names = TRUE)
# process each file:
lapply(X = all_files,
       FUN = function(path) {
  # read your data
  df <- openxlsx::read.xlsx(path)
  # do your transformation
  df_out <- some_transformations_to_your_data(df)

  # replace data with result to get new filename:
  path_out <- sub(pattern = "Data", replacement = "Results", x = path_out)

  # write result to new filename:
  openxlsx::write.xlsx(x = df_out, file = path_out)
})

编辑:

如果您需要注释中所述的更多输出,请执行以下操作:

for (transformation_type in c("epi", "miss", "algor")) {
  openxlsx::write.xlsx(
    x = do.call(what = paste0("transformation_", transformation_type), args = list(df = df)), 
    file = sub(pattern = "Data", replacement = paste0(transformation_type, "_Results"), x = path_out)
  )
}

这样整个表达式变成:

# list all xslx data
all_files <- list.files(path = "path_to_data", 
                        pattern = "Data_\\d+\\.xlsx$", 
                        full.names = TRUE)
# process each file:
lapply(X = all_files,
       FUN = function(path) {
         # read your data
         df <- openxlsx::read.xlsx(path)
         # do your transformation
         for (transformation_type in c("epi", "miss", "algor")) {
           openxlsx::write.xlsx(
             x = do.call(what = paste0("transformation_", transformation_type), args = list(df = df)), 
             file = sub(pattern = "Data", replacement = paste0(transformation_type, "_Results"), x = path_out)
           )
         }
       })

如果你有转换命名的函数,即函数transformation_epi()产生epi输出等,这是有效的。

答案 1 :(得分:0)

您可以使用xlsx扩展名

的文件创建列表
setwd(".../folder_with_excel")  
file.list <- list.files(pattern='*.xlsx')

然后你可以创建一个读取文件的函数loc。您可以确定要阅读的工作表(&#34; A&#34;在这种情况下)和您要选择的列。此外,创建一个标识要导入的文件的变量。

    loc = lapply(file.list, function(i){
      x = read_xlsx(i, sheet= "A")

  # Get the columns
      x = x[, c("col1", "col2", "col3")]

 # Add a column to say which file they're from
      x$file = i

# Return data
      x
    })

它将返回数据框列表。您可以使用rbindlist将它们合并到一个数据框中。

# Transform the list into a data frame 
        all_excel_df =  rbindlist(loc)

答案 2 :(得分:-1)

您可以使用for功能将部分文件名粘贴在一起,如下所示:

walk

抱歉笨拙的&#34;类似代码的&#34;上面的结构,但你没有提供过程转换的任何信息&#34;数据_&#34;到&#34;结果_&#34;

但是,purrr循环在R中较慢,因此下一步可能是使用{{1}}形式{{1}}包。

答案 3 :(得分:-1)

我会使用dir()函数来确定文件夹/目录中文件的名称。

library(readxl) # reading
library(WriteXLS) # writing
for(i in dir("data_files_folder")){

  cat(i, "\n")

  d <- read_excel(path = paste0("data_files_folder/", i))

  # something done with it 

  WriteXLS(d, ExcelFileName = paste0("keep_files_here/", i)  )

}