我对R来说很新,并且碰壁了。我知道其他一些人已经问过这个问题,但是我试图让我的代码工作,希望能够理解错误的原因 -
The prompt is as follows: Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function should return a numeric vector of length 0. A prototype of this function follows:
corr <- function(directory, threshold = 0) {
## 'directory' is a character vector of length 1 indicating the location of
## the CSV files
## 'threshold' is a numeric vector of length 1 indicating the number of
## completely observed observations (on all variables) required to compute
## the correlation between nitrate and sulfate; the default is 0
## Return a numeric vector of correlations
我现在沉没9小时的代码如下:
spectdata<- list.files(pattern= ".csv") #creates vector with list of filenames
corr<-function(directory,threshold =0, id = 1:332){
combined<-data.frame() #creates empty data frame
output<-data.frame()
output1<-data.frame()
for(i in id){
combined<-rbind(read.csv(directory[i], header=TRUE))
output<-rbind(output,combined) #will open the CVS files and append the tables together
output1<-output[complete.cases(output), ] #??gets rid of NA in files
sulfate<-output1["sulfate"] # ?? I think this will be a vector that is a subset of output1 that matches the "sulfate" column
nitrate<-output1["nitrate"]# ?? I think this will be a vector that is a subset of output1 that matches the "nitrate" column
}
ok<-complete.cases(combined) #counts the number of complete cases
if (threshold>= ok){
correlation<-cor(data.frame(nitrate,sulfate))
return(correlation)}
else {
print ("nothing!") }
}
cr<-corr(spectdata,threshold =150)
head(cr)
**I'm getting:**
> cr<-corr(spectdata,threshold =150)
Warning message:
In if (threshold >= ok) { :
the condition has length > 1 and only the first element will be used
> head(cr)
nitrate sulfate
nitrate 1.00000000 0.06243369
sulfate 0.06243369 1.00000000
The answer for this particular problem where threshold = 150, should be:
source("corr.R")
source("complete.R")
cr <- corr("specdata", 150)
head(cr)
## [1] -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 -0.07588814
so it looks like the answer I have is completely wrong ha
Please feel free to provide any insight to 1) how to get a correctly sized vector, any other syntax or verbiage that might be helpful
非常感谢任何和所有帮助〜
我在PC上运行Windows 10中的RStudio。
答案 0 :(得分:0)
下面请找一个更清晰的代码。我很乐意回答任何问题。
corr <- function(directory, threshold = 0) {
# set the working directory
setwd(dir = directory)
# creates vector of filenames within the directory
spectdata <- list.files(pattern = ".csv")
# for each spectdata, read the sulfate and nitrate columns
L1 <- lapply(spectdata, function(x) read.csv(x, header = TRUE)[,c("sulfate","nitrate")])
# for each csv that was read, removes rows that have NA
L2 <- lapply(L1, function(x) x[complete.cases(x),])
# removes csv from list if not greater than or are equal to the threshold
L3 <- Filter(function(x) nrow(x) >= threshold, L2)
# if the list still has a csv results after Filter (length of list > 0) then:
if(length(L3) > 0) {
# for each csv in list, calculate and save correlation between sulfare and nitrate
Correlation <- lapply(L3, function(x) cor(x[,"sulfate"], x[,"nitrate"]))
# change list output to a vector output
unlist(Correlation)
} else {
# return a zero length vector
numeric(0)
}
}
corr(directory = "C:/Users/Evan Friedland/Desktop/DIRECTORY", threshold = 100)
没有评论它看起来更简洁:
corr <- function(directory, threshold = 0) {
setwd(dir = directory)
spectdata <- list.files(pattern = ".csv")
L1 <- lapply(spectdata, function(x) read.csv(x, header = TRUE)[,c("sulfate","nitrate")])
L2 <- lapply(L1, function(x) x[complete.cases(x),])
L3 <- Filter(function(x) nrow(x) >= threshold, L2)
if(length(L3) > 0) {
Correlation <- lapply(L3, function(x) cor(x[,"sulfate"], x[,"nitrate"]))
unlist(Correlation)
} else {
numeric(0)
}
}