Question

我必须将带有随机名称的多个xlsx文件读入单个数据帧。每个文件的结构是相同的。我只需要导入特定的列。

我试过了：

<html>
    <head>
        <script src="mycontentscript.js"></script>
..............

但这只是一次只有一个文件，我无法指定我的特定列。我甚至尝试过：

dat <- read.xlsx("FILE.xlsx", sheetIndex=1, 
                  sheetName=NULL, startRow=5, 
                  endRow=NULL, as.data.frame=TRUE, 
                  header=TRUE)

但在那之后循环不起作用。怎么做？提前谢谢。

Answer 1

我会将每张表读到列表中：

获取文件名：

f = list.files("./")

读取文件：

dat = lapply(f, function(i){
    x = read.xlsx(i, sheetIndex=1, sheetName=NULL, startRow=5,
        endRow=NULL, as.data.frame=TRUE, header=T)
    # Get the columns you want, e.g. 1, 3, 5
    x = x[, c(1, 3, 5)]
    # You may want to add a column to say which file they're from
    x$file = i
    # Return your data
    x
})

然后，您可以使用以下方式访问列表中的项目：

dat[[1]]

或者通过以下方式对他们执行相同的任务：

lapply(dat, colmeans)

将它们转换为数据框（文件列现在变得有用）：

dat = do.call("rbind.data.frame", dat)

Answer 2

我对for循环更熟悉，这可能有点麻烦。

filelist <- list.files(pattern = "\\.xlsx")＃列出目录中的所有xlsx文件

allxlsx.files <- list()  # create a list to populate with xlsx data (if you wind to bind all the rows together)
count <- 1
for (file in filelist) {
   dat <- read.xlsx(file, sheetIndex=1, 
              sheetName=NULL, startRow=5, 
              endRow=NULL, as.data.frame=TRUE, 
              header=TRUE) [c(5:10, 12,15)] # index your columns of interest
   allxlsx.files[[count]] <-dat # creat a list of rows from xls files
   count <- count + 1
}

转换回data.frame

allfiles <- do.call(rbind.data.frame, allxlsx.files)

Answer 3

有关Wyldsoul答案的变体，但在同一Excel文件中使用for循环跨越多个Excel工作表（介于1和j之间），并使用dplyr进行绑定：

library(gdata) 
library(dplyr)

for (i in 1:j) {
  dat <- read.xls(f, sheet = i) 
  dat <- dat[,1:14] # index your columns of interest
  allxlsx.files[[count]]
  count <- count + 1
}

allfiles <- do.call(bind_rows, allxlsx.files)

如何使用具有特定行和列的循环读取R中的多个xlsx文件

3 个答案: