如何使用R在一个大目录中将文件名的第一部分组合成两个.xls文件?

时间:2018-03-28 15:58:59

标签: r

我有一个100个 .xls 文件的目录,每个州有两个(一个用于学校,一个用于大学)。它们都有类似的命名约定:

Alabama - Schools - 2018-03-28
Alabama - Universities - 2018-03-28
Alaska - Schools - 2018-03-28
Alaska - Universities - 2018-03-28

我想首先将状态名称的文件合并到.xlsx文件中,然后将.xlsx文件中的选项卡重命名为“学校”和“大学”。

输出将是50个.xlsx文件(每个州一个),有两个标签:“学校”和“大学”。这50个文件中的每一个都只是以其状态命名。 (例如,“Alabama.xlsx”和“Alaska.xlsx”。)

有关如何实现这一目标的任何建议?我不清楚如何通过文件名的第一部分合并两个文件。 (在上面的示例中,仅通过“Alabama”合并而不是文件名的其余部分。)

我感谢任何帮助。

1 个答案:

答案 0 :(得分:1)

请尝试给我一个反馈。如果它有效,我将逐步解释。

country <- c("Alabama", "Alaska")
ref <- c("School", "Universities")


apply(expand.grid(country, ref), 1, paste, collapse=" - ")


file_names <- paste0(apply(expand.grid(country, ref), 1, paste, collapse=" - ")," - 2018-03-28.xls")


read_df <- data.frame(path = paste0(getwd(), file_names), file_names = file_names,
                      country = unlist(lapply(strsplit(file_names, "-"), `[[`, 1)),
                      ref = unlist(lapply(strsplit(file_names, "-"), `[[`, 2)),
                      final_name = paste0(unlist(lapply(strsplit(file_names, "-"), `[[`, 1)), ".xlsx"))

require(xlsx)

for (i in country) {
        assign(read_df[read_df$country ==  i & read_df$ref == "School", "file_names"],
               read.xlsx(read_df[read_df$country ==  i & read_df$ref == "School", "path"]),
               envir = .GlobalEnv)
        assign(read_df[read_df$country ==  i & read_df$ref == "Universities", "file_names"],
               read.xlsx(read_df[read_df$country ==  i & read_df$ref == "Universities", "path"]),
               envir = .GlobalEnv)
        write.xlsx(get(read_df[read_df$country ==  i & read_df$ref == "School", "file_names"]),
                   file = d_df[read_df$country ==  i & read_df$ref == "School", "final_name"],
                   sheetName="schools", row.names=FALSE)
        write.xlsx(get(read_df[read_df$country ==  i & read_df$ref == "Universities", "file_names"]),
                   file=d_df[read_df$country ==  i & read_df$ref == "School", "final_name"],
                   sheetName="universities",
                   append=TRUE, row.names=FALSE)
}