我目前有很多具有相同字段但不同行业的excel文件。我正在尝试创建一个函数,使我可以从所有Excel工作表的“硫酸盐”字段中获取总平均值。
这是我当前拥有的代码:
Mean_Pollution<-function(directory,pollutant,id = 1:332){
directory <- c("001","002","003","004")
for (x in directory){
print(paste("Reading",x,"file"))
temp = read.csv(paste(directory.path,x,".csv",sep = ""))
print(paste("Finished reading",x,"file"))
i = print(mean(temp$sulfate,na.rm = TRUE))
}
}
答案 0 :(得分:0)
您似乎拥有了所需的一切,并且有很多类似的问题,但我将提供一个示例。这里的假设是所有内容都在1个文件夹中。我将使用自己的文件夹:
setwd(dir = "C:/Users/Evan Friedland/Documents")
# Put some fake data in a fake folder
dir.create("Test_Folder")
setwd(dir = "C:/Users/Evan Friedland/Documents/Test_Folder")
n <- 5 # let's write 5 csv files
for(i in 1:n){
write.csv(data.frame(madeupX = LETTERS[sample(1:24, 100, T)], sulfate = rnorm(100)), # fake data
paste0(sprintf("%03d", i), ".csv")) # fake names
}
csvnames <- paste0(sprintf("%03d", 1:n), ".csv")
现在要“堆叠”每个文件的平均值,您需要做的是初始化一个空向量,遍历它们并将每个结果保存到向量元素中。
means <- numeric(n) # intialize numeric type vector, n long
names(means) <- csvnames # name each element for fun
means <- sapply(1:n, function(x){ # used sapply instead of a for loop but either is fine
cat("+ ",csvnames[x],"\n") # print which csv is running
mean(read.csv(paste0(csvnames[x]))$sulfate) # return the mean of the sulfate col
})
#+ 001.csv
#+ 002.csv
#+ 003.csv
#+ 004.csv
#+ 005.csv
means # print results
#[1] 0.007859499 0.077447995 0.048796633 -0.101449790 0.224429258