Question

这是我在R的循环中的第一次尝试，我不知道是否有可能做我正在尝试做的事情，但如果是这样，我认为这对许多可能找到你的人有用在Google搜索中回答我的问题。

我正在尝试将89个Excel电子表格附加到一起。这些电子表格有几十张，我需要选择我想要的那张以及删除前三行。我知道如何一次完成所有这一个文档，但有89个文档，男孩不会自动化这个很好。

有一点有用的是每个文档名称都以日期结束。在我的例子中，每个文件都是一天的电价数据。由于日期在文档的名称中，我希望使用first_date：last_date构造。

以下是单个文档上传编码的示例：

library(readxl)
MDFD_20170207 <- read_excel("O:/Project/P~Port of Seattle/Prices/Mid-C/20170615 Platt's/MDFD_20170207.xlsx", 
                            sheet = "Bilateral Indexes", col_names = FALSE, 
                            skip = 3)

以下是我尝试将基本R“用于”循环I've read这种情况的材料：

for (i in 20170207:20170210){
  print(paste(,i<- read_excel("O:/Project/P~Port of Seattle/Prices/Mid-C/20170615 Platt's/MDFD_,i.xlsx", 
                              sheet = "Bilateral Indexes", col_names = FALSE, 
                              skip = 3)
  ))
}

它无效，我收到以下错误消息：

Error in paste(, i <- read_excel("O:/Project/P~Port of Seattle/Prices/Mid-C/20170615 Platt's/MDFD_,i.xlsx",  : 
  argument is missing, with no default

我不确定这意味着什么。例如，它说的是什么论点缺失？

我希望我已经写了足够的解释，这样如果给出了可行的答案，其他人就可以节省工作时间，而不必在追加大数据集时一次上传一个文档。

更新：这是我一直在努力的另一种方法：

    df <- data.frame()
    full_path <- "O:/Project/P~Port of Seattle/Prices/Mid-C/20170615 Platt's/"
    docs <- c(20170207:20170209)
    for (f in docs){ 
      filename <- paste0(full_path, f,".xlsx")
      tmp_df <- read_excel(filename, sheetName = "Bilateral Indexes", col_names = FALSE, skip = 3)
      df <- rbind(df,tmp_df)
}

就结构而言，这一切似乎都有效，但是它不接受我的read_excel命令：

Error in sheets_fun(path) : 
  Evaluation error: zip file 'O:/Project/P~Port of Seattle/Prices/Mid-C/20170615 Platt's/20170207.xlsx' cannot be opened.

Answer 1

您可以先创建文件名字符串，然后在read_excel调用中使用该变量。我假设您要将所有文件的数据附加到一个表中，并且每个文件具有相同的结构（即列名）...

#create data frame to store rows for all Excel files
all.rows <- data.frame()

#loop through files and append data to said data frame     
for (i in 20170207:20170210){
    filename = paste("O:/Project/P~Port of Seattle/Prices/Mid-C/20170615 Platt's/MDFD_",i,".xlsx",sep='')
    tmp_table <- read_excel(filename,sheet = "Bilateral Indexes", col_names = FALSE, skip = 3))
    #subset tmp_table to desired columns
    all.rows <- rbind(all.rows, tmp_table[,c('peak prices','off-peak prices')])
}

#now you can perform calculations on the data frame [replace <column> with your column name]
mean.var <- mean(all.rows$<column>)

Answer 2

保存对象的迭代时，我更喜欢使用列表来保持环境清洁。您只需要使用双括号[[而不是[进行子集化。另外，使用paste0函数组合字符串，没有空格。

docs <- c(20170207:20170210)
# initialize a list object to save them to
MDFD <- vector("list", length(docs)) # alternatively you could just write <- list()

# way of combining the string
paste0("O:/Project/P~Port of Seattle/Price/Mid-C/20170615 Platt's/MDFD_", docs[1],".xlsx")

for(i in 1:length(docs)){
  # double check that "/P~Port" bit you have there...
  # and you have fixed the date 20170615 too?
  MDFD[[i]] <- read_excel(paste0("O:/Project/P~Port of Seattle/Price/Mid-C/20170615 Platt's/MDFD_", docs[i],".xlsx"), sheet = "Bilateral Indeces", col_names = FALSE, skip = 3)
}

修改

如果您希望从列表中更好地调用每个工作簿的名称names(MDFD) <- docs，您可以在MDFD$之后点击选项卡以选择并完成调用列表中的哪个元素喜欢，比如。

MDFD$`20170207`

Answer 3

这是一个可能的解决方案，它将遍历给定目录中的一组Excel文件，并将特定工作表的内容加载到不断增长的数据框中。这可能比尝试以算法方式创建每个可能的文件名然后处理丢失的文件更好。

我添加了一列来标识每行来自的文件，还添加了一个“日志数据帧”来跟踪从每个文件加载了多少行数据。

library(readxl)

# build an empty data frame to hold all the data
df <- data.frame()

# build an empty dataframe to log the results
log_df <- data.frame()

#your path is different
full_path <- "/home/dale/GetExcelFilesR/files/"

# get a list of all the xlsx files in this folder
file_list <- list.files(path = full_path, pattern = "*.xlsx")

for (f in file_list){
  filename <- paste0(full_path, f)

  #load this file into a temporary dataframe
  tmp_df <- NULL  #make sure it's empty first

  # each sheet had 2 rows at top, 3rd has column names
  tmp_df <- read_excel(filename, sheet = "Specific Sheet Name", col_names = TRUE, skip = 2)

  #add the filename as a column in the data
  tmp_df <- cbind(SourceFile=f, tmp_df)

  #append that temp dataframe to our main dataframe
  df <- rbind(df,tmp_df)

  status_text <- paste0(nrow(tmp_df), " rows read")
  log_df <- rbind(log_df, data.frame(SourceFile=f, Status=status_text))
}

View(log_df)

可以同时在对象名称和Excel文档名称中使用R“for”循环吗？

3 个答案: