Question

我使用调查，并希望将大量表格（从数据框中提取）导出到.xlsx或.csv文件中。我使用xlsx包来执行此操作。这个包要求我规定excel文件中的哪一列是表的第一列。因为我想将多个表粘贴到.csv文件中，所以我需要能够规定表n的第一列是表的长度（n-1）+ x空格数。为此，我计划创建如下的值。

dt＃是通过将表格更改为数据框来实现的。

table1 <- table(df$y, df$x)
dt1 <- as.data.frame.matrix(table1)

这里我为起始列的编号

创建值

startcol1 = 1
startcol2 = NCOL(dt1) + 3
startcol3 = NCOL(dt2) + startcol2 + 3
startcol4 = NCOL(dt3) + 3 + startcol2 + startcol3

等等。我可能需要生产50-100个表之间的某个地方。在R中是否有一种方法可以使这个迭代过程成为可能，因此我可以创建50个起始列的值而无需编写50多行代码，而每个代码都在前一个上面编写？

我在堆栈溢出和其他博客上发现了关于在R中编写for循环或使用apply类型函数的东西，但这一切似乎都涉及操纵向量而不是向工作区添加值。感谢

Answer 1

您可以使用与此类似的结构：

您要阅读的文件列表：

file_list = list.files("~/test/",pattern="*csv",full.names=TRUE)

对于每个文件，读取并处理数据框并捕获您正在读取/处理的帧中有多少列：

columnsInEachFile = sapply(file_list,
       function(x)
       {
         df = read.csv(x,...) # with your approriate arguments
         # do any necessary processing you require per file
         return(ncol(df))
       }
)

列数的累加和加1表示数据框的起始列，其中包含彼此相邻的已处理数据：

columnsToStartDataFrames = cumsum(columnsInEachFile)+1
columnsToStartDataFrames = columnsToStartDataFrames[-length(columnsToStartDataFrames)] # last value is not the start of a data frame but the end

Answer 2

假设tab.lst是包含表格的列表，那么您可以这样做：

cumsum(c(1, sapply(tail(tab.lst, -1), ncol)))

基本上，我在这里做的是我循环遍历所有表但是最后一个（因为那个的开始col由倒数第二个决定），并且使用ncol得到每个表的宽度。然后我正在对该向量进行累积求和以获得所有起始位置。

以下是我创建表格的方式（基于df中所有可能的列组合的表格）：

df <- replicate(5, sample(1:10), simplify=F)     # data frame with 5 columns
names(df) <- tail(letters, 5)                    # name the cols
name.combs <- combn(names(df), 2)                # get all 2 col combinations
tab.lst <- lapply(                               # make tables for each 2 col combination 
  split(name.combs, col(name.combs)),            #   loop through every column in name.combs
  function(x) table(df[[x[[1]]]], df[[x[[2]]]])  #   ... and make a table
)

使用迭代代码在R中创建函数

2 个答案: