Question

下面的函数会遍历多个CSV文件，并返回一个数据框，其中包含文件名和每个文件中的完整行数（无缺失值）。虽然我在开头（id和nobs）为complete_rows分配列名，但返回的数据框并没有相同的名称。为什么会这样？

complete <- function(directory, id = 1:332) {
    #navigate to directory
    setwd(directory)

    #keep track of row name and number of completed rows
    complete_rows <- data.frame(id=numeric(0), nobs=numeric(0))  

    #csv names
    myfiles <- list.files(pattern = "csv")

    #loop through files
    for(i in id) {

        #read each file
        current_dataset <- read.csv(myfiles[i])

        #include only files with complete datasets
        good_rows <- current_dataset[complete.cases(current_dataset),]

        #push id and number of good rows to data frame
        complete_rows <- rbind(complete_rows, c(i, nrow(good_rows)))

        #increment loop
        i <- i + 1
    }
    #return data frame
    complete_rows
}

Answer 1

我不确定您为什么会遇到这种行为，但我会建议对您的代码进行一些调整，如下所示：

complete <- function(directory, id = 1:332) {

#navigate to directory
setwd(directory)

#keep track of row name and number of completed rows
complete_rows <- data.frame(id=numeric(length(id)), nobs=numeric(length(id)))  

#csv names
myfiles <- list.files(pattern = "csv")

#loop through files
for(i in id) {

  #read each file
  current_dataset <- read.csv(myfiles[i])

  # write id
  complete_rows$id[i] <- i

  # write nobs
  complete_rows$nobs[i] <- sum(complete.cases(current_dataset))

  }

#return data frame
return(complete_rows)
}

如果您只想要观察的ID和数量，则不需要使用rbind并从函数中返回一些内容，您可以使用return或不使用任何内容（然后返回据我所知，最后评估的表达式）。并且您可以使用您需要的行数初始化complete_rows，因为您事先已经知道了。您也不需要在for循环中手动增加i，因为这已在for(i in id)中完成。

这对你有用吗？

编辑/注：

将所有文件一次性读入列表然后对它们进行操作可能会更好。

Answer 2

在名称相同的两个data.frame上使用rbind：

complete_rows <- rbind(complete_rows, data.frame(id=i, nobs=nrow(good_rows)))

您的代码与R的代码不同，正如beginneR所涵盖的那样。

在R中，我无法将列名称分配给数据框

2 个答案: