我是一个完整的R菜鸟,我正在尝试学习R trough Coursera ...我正在尝试编写一个函数来计算332个不同的csv文件的平均值。我得到了正确的值,但输出错误。我应该得到其中一个因素的平均值,但我得到两个因素的平均值。
#Assign the directory
pollutantmean <- function (directory, pollutant, id = 1:332) {
directory <- list.files(path= "/Users/......./specdata")
#Create empty vector
g <- list()
#For loop to run through the files and get info and use rbind to create df
for(i in 1:length(directory)) {
g[[i]] <- read.csv(directory[i],header=TRUE)
}
rbg <- do.call(rbind,g)
#Subset to get the sulfate/nitrate columns and calcualte the mean
pollutant <- subset(rbg,ID %in% id ,select = c("sulfate","nitrate"))
colMeans(pollutant,na.rm = TRUE)
}
pollutantmean("specdata","sulfate",70:72)
sulfate nitrate
0.9501894 1.7060474
到目前为止,这么好......价值是正确的。然而,问题在于,由于我将“硫酸盐”传递到污染物中,所以我应该只获得硫酸盐含义。但是,相反,我得到了两者。这是为什么?我在这做错了什么?
谢谢,
答案 0 :(得分:0)
正确查看代码,您已将c("sulfate", "nitrate")
硬编码到函数中。传递给函数的变量不需要以这种方式进行硬编码。
将您的功能更改为以下内容:
#Assign the directory
pollutantmean <- function (directory, pollutant, id = 1:332) {
directory_files <- list.files(path = directory)
#Create empty vector
g <- list()
#For loop to run through the files and get info and use rbind to create df
for(i in 1:length(directory_files)) {
g[[i]] <- read.csv(directory_files[i],header=TRUE)
}
rbg <- do.call(rbind,g)
#Subset to get the sulfate/nitrate columns and calcualte the mean
pollutant_subset <- subset(rbg,ID %in% id ,select = pollutant)
colMeans(pollutant_subset,na.rm = TRUE)
}
pollutantmean("/Users/......./specdata","sulfate",70:72)
sulfate
0.9501894
你现在应该得到正确的结果。