Question

我是一个完整的R菜鸟，我正在尝试学习R trough Coursera ...我正在尝试编写一个函数来计算332个不同的csv文件的平均值。我得到了正确的值，但输出错误。我应该得到其中一个因素的平均值，但我得到两个因素的平均值。

#Assign the directory
pollutantmean <- function (directory, pollutant, id = 1:332) {
  
  directory <- list.files(path= "/Users/......./specdata")

  #Create empty vector 
  g <- list()

  #For loop to run through the files and get info and use rbind to create df
  for(i in 1:length(directory)) {

    g[[i]] <- read.csv(directory[i],header=TRUE)

  }

  rbg <- do.call(rbind,g)

  #Subset to get the sulfate/nitrate columns and calcualte the mean
  pollutant <- subset(rbg,ID %in% id ,select = c("sulfate","nitrate"))
  colMeans(pollutant,na.rm = TRUE) 
  
}

pollutantmean("specdata","sulfate",70:72) 

sulfate   nitrate 
0.9501894 1.7060474

到目前为止，这么好......价值是正确的。然而，问题在于，由于我将“硫酸盐”传递到污染物中，所以我应该只获得硫酸盐含义。但是，相反，我得到了两者。这是为什么？我在这做错了什么？

谢谢，

Answer 1

正确查看代码，您已将c("sulfate", "nitrate")硬编码到函数中。传递给函数的变量不需要以这种方式进行硬编码。

将您的功能更改为以下内容：

#Assign the directory
pollutantmean <- function (directory, pollutant, id = 1:332) {
  
  directory_files <- list.files(path = directory)

  #Create empty vector 
  g <- list()

  #For loop to run through the files and get info and use rbind to create df
  for(i in 1:length(directory_files)) {

    g[[i]] <- read.csv(directory_files[i],header=TRUE)

  }

  rbg <- do.call(rbind,g)

  #Subset to get the sulfate/nitrate columns and calcualte the mean
  pollutant_subset <- subset(rbg,ID %in% id ,select = pollutant)
  colMeans(pollutant_subset,na.rm = TRUE) 
  
}

pollutantmean("/Users/......./specdata","sulfate",70:72) 

sulfate
0.9501894

你现在应该得到正确的结果。

R：输出功能，输出中的因子数量错误

1 个答案: