我有100个csv文件,我打算选择并计算硫酸盐/硝酸盐柱中存在的数据总和,如下所述。
CSV格式为:
Date sulfate nitrate ID
1/1/2003 NA NA 1
1/2/2003 NA NA 1
1/3/2003 NA NA 1
1/4/2003 NA NA 1
1/5/2003 NA NA 1
1/6/2003 NA NA 1
1/7/2003 NA NA 1
1/8/2003 NA NA 1
1/9/2003 NA NA 1
1/10/2003 NA NA 1
1/11/2003 NA NA 1
1/12/2003 NA NA 1
1/13/2003 NA NA 1
1/14/2003 NA NA 1
1/15/2003 NA NA 1
1/16/2003 NA NA 1
1/17/2003 NA NA 1
1/18/2003 NA NA 1
1/19/2003 NA NA 1
所有100个文件都在一个文件夹中,名称为001.csv,002.csv ... 100.csv
此处的ID是csv文件的名称。所有100个文件都具有上述格式。
这是我到目前为止编写的代码:
pollutantmean <- function(directory,pollutant,id = 1:332)
{
test<- c('sulfate','nitrate')
for(i in seq_along(id))
{
j<-formatC(i, width=3, flag="0")
temp<-"C:/Users/Himanshu/Downloads/rprog-data-specdata/"
temp1<-paste(temp,directory,sep="")
filepath<- file.path(temp1,paste(j,".csv",sep=""))
if(test[1]==pollutant)
{
data<-read.csv(filepath,header = TRUE, sep = "\t",colClasses=c(NA,"sulfate",NA,NA))
sum(x=data,na.rm=FALSE)
}
else if(test[2]==pollutant)
{
data<-read.csv(filepath,header = TRUE, sep = "\t",colClasses=c(NA,NA,"nitrate",NA))
sum(x=data,na.rm=FALSE)
}
data
}
}
我在R studio的命令提示符
上执行语句时遇到以下错误data<-read.csv(filepath,header = TRUE, sep = "\t")[,c('nitrate')]
错误 -
Error in `[.data.frame`(read.csv(filepath, header = TRUE, sep = "\t"), :
undefined columns selected
我尝试的另一种方式是 -
data<-read.csv(filepath,header = TRUE, sep = "\t",colClasses=c(NA,"sulfate",NA,NA))
本案例中的错误是 -
Warning message:
In read.table(file = file, header = header, sep = sep, quote = quote, :
cols = 1 != length(data) = 4
这是用户将放入R的命令提示符 -
pollutantmean("specdata", "nitrate", 1:72)
这里的第一个参数是目录引用,第二个参数是列名引用,第三个参数是要拾取的CSV文件的数量。
答案 0 :(得分:0)
pollutantmean <- function(directory,pollutant,id=1:332){
#pollutant can be only character: "sulfate" or "nitrate"
# id is numeric and can take values from 1 to 332
temp<-paste0("C:/Users/Himanshu/Downloads/rprog-data-specdata/",directory)
for (i in seq_along(id)){
j<-formatC(i, width=3, flag="0")
filepath<- file.path(temp,paste0(j,".csv"))
data<-read.csv(filepath,header = TRUE, sep = ",")
if (pollutant=="sulfate"){
return(sum(data[complete.cases(data[,"sulfate"]),"sulfate"]))
}
if (pollutant=="nitrate"){
return(sum(data[complete.cases(data[,"nitrate"]),"nitrate"]))
}
}
}
#check
pollutantmean (test,"sulfate",1:332)
答案 1 :(得分:0)
所以这一切都是: 然后创建名称列表 然后将所有csvs读入列表 然后获取每个csv的指定总和 然后将数据列表减少到data.frame 将csv-names列添加到data.frame
我希望这有效。
pollutantmean <- function(directory,pollutant,id=1:332){
require(dplyr)
formatC(seq_along(id), width=3, flag="0") %>%
paste0(.,'.csv') %>%
file.path("C:","Users","Himanshu","Downloads","rprog-data-specdata",directory,.) %>%
lapply(.,{. %>% read.csv(.,header = TRUE, sep = ",")}) %>%
bind_rows() %>%
select(pollutant=contains(pollutant)) %>%
summarise(mean=mean(pollutant,na.rm=T)) %>%
.$mean
}
发现错字