Question

我想创建一个循环遍历大量文件的函数，计算每个文件的完整个案数，然后使用＆＃34; ID＆＃34;将新行追加到现有数据框。文件的编号及其相应的完整案例数。

下面我创建了一个只返回数据帧最后一行的代码。我相信我的函数只返回最后一行，因为R会在每个循环中覆盖我的数据帧，但我不确定。我在网上做了很多研究如何解决这个问题，但我找不到一个简单的解决方案（我对R来说非常新）。

下面你可以看到我的代码和输出：

complete <- function(directory = "specdata", id = 1:332) {
  files_list <- list.files("specdata", full.names = T) # creates a list of files

  dat <- data.frame() # creates an emmpty data frame

    for (i in id) {

    data <- read.csv(files_list[i]) # reads the file "i" in the id vector 

    nobs <- sum(complete.cases(data)) # counts the number of complete cases in that file  

    data_frame <- data.frame("ID" = i, nobs) # here I want to store the number of complete cases in a data frame

    output <- rbind(dat, data_frame) # here the data_frame should be added to an existing data frame
  }

  print(output)
}

当我运行complete( , 3:5)时，我得到以下结果：

  ID nobs
1  5  402

非常感谢你的帮助！：）

Answer 1

正如Maxim.K所说，有更好的方法可以做到这一点，但这里的实际问题是你的output变量会在for循环的每次迭代中被覆盖。试试：

dat <- rbind(dat, data_frame)

并打印dat。

Answer 2

而不是for (i in id) {，请在循环开头尝试for (i in 1:322) {或for (i in 1:length(id) {

在循环中将数据附加到数据帧 - 函数仅返回数据帧的最后一行

2 个答案: