Question

我试图在R中编写for循环，从目录中读取文件名列表，将它们转换为数据帧，并将它们连接成一个大数据帧，同时为每个数据添加一个标识符框架，所以我知道哪个文件生成哪些数据来自绘图。到目前为止，我有一个for循环，它运行在一个函数上，该函数将每个数据帧附加到我早期初始化的空数据框，如下所示：

filenames <- list.files(path="reads/metrics", pattern="*.txt", all.files=T, recursive=FALSE, full.names = TRUE)
n= 0
pesto = data.frame(size=character(), fcount= character(),rcount=character(), total = character(), Identifier= character())

concat = function(filename, n){
    dat = read.table(filename, header=TRUE, na.strings="EMPTY")
    dat_i = transform(dat, Identifier = rep((paste("time", n, sep="")), nrow((dat))))
    pesto <<- rbind(dat_i)
}

for (f in filenames) {
n = n+1
concat(f, n)
}

因此，对于两个示例数据框，在读入之后看起来像这样：

> df1 (from file of Time = 1)
         size     fcount     rcount   total
[1,]       1        2           3         5
[2,]       4        1           1         2
[3,]       5        1           2         3

> df2 (from file of Time = 2)
         size     fcount     rcount   total
[1,]       1        3           6         9
[2,]       3        1           5         6
[3,]       5        1           2         3

所需的输出看起来像，

> pesto
         size     fcount     rcount   total    Identifier
[1,]       1        2           3         5        time1
[1,]       1        3           6         9        time2
[2,]       3        1           5         6        time2
[2,]       4        1           1         2        time1
[3,]       5        1           2         3        time1
[3,]       5        1           2         3        time2

相反，我的输出只是df2，但标有！

到目前为止，在调试中我已经要求打印函数（n）以确保我正确地在循环中迭代并且它给了我正确的输出：

[1] 1
[1] 2

我完全失去了让它工作 - 用手连接文件是一件痛苦的事！

Answer 1

使用for，您可以不使用lapply循环。（我知道*apply函数是伪装的循环，但它们通常被认为是更好的R代码。）

files_list <- lapply(filenames, read.table, header=TRUE, na.strings="EMPTY")
pesto <- lapply(seq_along(files_list), function(n){
                x <- files_list[[n]]
                x$Identifier <- paste0("time", n)
                x
            })
pesto <- do.call(rbind, pesto)
pesto <- pesto[order(pesto$size), ]
pesto

在R中迭代地连接和标记数据帧

1 个答案: