Question

我在这里有一个脚本，它可以在某个列中得到一个数字。现在，我不仅要收集目录中每个文件的第一张，而且要收集每个文件的每张。

现在.csv文件R已写入显示2列，列A是文件名，B是R抓取的数字。

我应该在下面的脚本中添加哪些修改，使csv输出显示3列，A是文件名，B是表名，C是数字？

require(xlsx)
#setwd
setwd("D:\\Transferred Files\\")
files <- (Sys.glob("*.xls"))
f<-length(files)

DF <- data.frame(txt=rep("", f),num=rep(NA, f),stringsAsFactors=FALSE)

# files loop
for(i in 1:f)
{
  A<-read.xlsx(file=files[i],1,startColumn=1, endColumn=20, startRow=1, endRow=60)
  #Find price
  B<-as.data.frame.matrix(A)
  P<-B[which(apply(B, 1, function(x) any(grepl("P", x)))),which(apply(B, 2, function(x) any(grepl("P", 

x))))+6]

  #fill price DF
  DF[i, ] <-c(files[i],P)
}
write.csv(DF, "prices.csv", row.names=FALSE)

我尝试过XLconnet，但是真的无法解决这个问题。

Answer 1

您有一个良好的开端，但您正在询问如何将循环中的工作表添加到文件中。如果您阅读?read.xlsx，您会在代码中看到两个正在掩盖的参数（好吧，使用一个，忽略另一个）：

Usage:

     read.xlsx(file, sheetIndex, sheetName=NULL, rowIndex=NULL,
       startRow=NULL, endRow=NULL, colIndex=NULL,
       as.data.frame=TRUE, header=TRUE, colClasses=NA,
       keepFormulas=FALSE, encoding="unknown", ...)

Arguments:

    file: the path to the file to read.

sheetIndex: a number representing the sheet index in the workbook.

sheetName: a character string with the sheet name.

您只需要提供两者之一。

您可能会问“我怎么知道工作表中有多少张？”（对于sheetIndex）甚至“表格名称是什么？” / em>（对于sheetName）。 ?getSheets救援：

Usage: getSheets(wb) Arguments: wb: a workbook object as returned by 'createWorksheet' or 'loadWorksheet'. Value: 'getSheets' returns a list of java object references each pointing to an worksheet. The list is named with the sheet names.

您需要使用loadWorkbook(file)代替read.xlsx才能获取工作表名称，但只需阅读一些手册即可获得切换所需的信息。（你可以使用像getSheets(loadWorkbook(file))这样的东西，但根据我的经验，我试图避免在同一个脚本中多次打开同一个文件，无论自动关闭。）

作为替代方案，Hadley的readxl软件包在其简单性，速度和稳定性方面显示出前景。它有excel_sheets()和read_excel()，可满足您的需求。（事实上，那是所有它有......简单就是“好事（tm）”。）

<强> 修改：

library(XLConnect) ## Loading required package: XLConnectJars ## XLConnect 0.2-11 by Mirai Solutions GmbH [aut], ## Martin Studer [cre], ## The Apache Software Foundation [ctb, cph] (Apache POI, Apache Commons ## Codec), ## Stephen Colebourne [ctb, cph] (Joda-Time Java library) ## http://www.mirai-solutions.com , ## http://miraisolutions.wordpress.com ## Attaching package: 'XLConnect' ## The following objects are masked from 'package:xlsx': ## createFreezePane, createSheet, createSplitPane, getCellStyle, getSheets, loadWorkbook, removeSheet, saveWorkbook, setCellStyle, setColumnWidth, setRowHeight wb1 <- loadWorkbook('Book1.xlsx') shts1 <- getSheets(wb1) shts1 ## [1] "Orig" "Sheet2" "Sheet8" "Sheet3" "Sheet4" "Sheet5" "Sheet6" "Sheet7" for (ws in shts1) { message(ws) # just announcing myself dat <- readWorksheet(wb1, ws) message(paste(dim(dat), collapse=' x ')) # do something meaningful, not this } ## Orig ## 128 x 11 ## Sheet2 ## 128 x 11 ## Sheet8 ## 128 x 19 ## Sheet3 ## 17 x 11 ## Sheet4 ## 128 x 11 ## Sheet5 ## 128 x 11 ## Sheet6 ## 128 x 11 ## Sheet7 ## 128 x 11

编辑＃2 ：

作为更详细的迭代示例：

library(XLConnect) for (fn in list.files(pattern="*.xlsx")) { message('Opening: ', fn) wb <- loadWorkbook(fn) shts <- getSheets(wb) message(sprintf(' %d Sheets: %s', length(shts), paste(shts, collapse=', '))) for (sh in shts) { dat <- readWorksheet(wb, sh) ## do something meaningful with the data } }

我不确定你在使用你的代码做什么（因为你从来没有说过任何电子表格中包含的内容），而是另一种方法（我将用它代替前一个双 - {{1例子）是将所有内容包含在列表中：

for

如果您不关心区分特定工作表的工作簿 - 并随后简化数据处理 - 那么您可以将嵌套列表“展平”到一个列表中：

dat <- sapply(list.files(pattern='*.xlsx'), function(fn) { wb <- loadWorkbook(fn) sapply(getSheets(wb), function(sh) readWorksheet(wb, sh)) }) str(dat, list.len=2) ## List of 4 ## $ Book1.xlsx:List of 8 ## ..$ Orig :'data.frame': 128 obs. of 11 variables: ## .. ..$ i : num [1:128] 1 2 3 4 5 6 7 8 9 10 ... ## .. ..$ x : num [1:128] 1606527 7484 437881 1601729 1341668 ... ## .. .. [list output truncated] ## ..$ Sheet2:'data.frame': 128 obs. of 11 variables: ## .. ..$ i : num [1:128] 1 2 3 4 5 6 7 8 9 10 ... ## .. ..$ x : num [1:128] 1606527 7484 437881 1601729 1341668 ... ## .. .. [list output truncated] ## .. [list output truncated] ## $ Book2.xlsx:List of 8 ## ..$ Orig :'data.frame': 128 obs. of 11 variables: ## .. ..$ i : num [1:128] 1 2 3 4 5 6 7 8 9 10 ... ## .. ..$ x : num [1:128] 1606527 7484 437881 1601729 1341668 ... ## .. .. [list output truncated] ## ..$ Sheet2:'data.frame': 128 obs. of 11 variables: ## .. ..$ i : num [1:128] 1 2 3 4 5 6 7 8 9 10 ... ## .. ..$ x : num [1:128] 1606527 7484 437881 1601729 1341668 ... ## .. .. [list output truncated] ## .. [list output truncated] ## [list output truncated]

现在，处理您的数据可能更简单。您查找“P”的代码有点缺陷，因为您将data.frame分配给另一个data.frame中的单元格，通常不赞成。

这可能会为您提出另一个问题。为此，我强烈要求提供better detailed问题，包括示例工作表的样子以及您对输出的期望。

如何编写R来循环set目录中每个文件的每个工作表

1 个答案: