关联两列,监视器ID#的种类,以及R中的返回向量列表相关性

时间:2017-07-02 22:41:32

标签: r vector

我对R来说很新,并且碰壁了。我知道其他一些人已经问过这个问题,但是我试图让我的代码工作,希望能够理解错误的原因 -

The prompt is as follows: Write a function that takes a directory of data files and a threshold for complete cases and calculates the correlation between sulfate and nitrate for monitor locations where the number of completely observed cases (on all variables) is greater than the threshold. The function should return a vector of correlations for the monitors that meet the threshold requirement. If no monitors meet the threshold requirement, then the function should return a numeric vector of length 0. A prototype of this function follows: 
        corr <- function(directory, threshold = 0) {
            ## 'directory' is a character vector of length 1 indicating the location of
            ## the CSV files

            ## 'threshold' is a numeric vector of length 1 indicating the number of
            ## completely observed observations (on all variables) required to compute
            ## the correlation between nitrate and sulfate; the default is 0

            ## Return a numeric vector of correlations

我现在沉没9小时的代码如下:

            spectdata<- list.files(pattern= ".csv") #creates vector with list of filenames
            corr<-function(directory,threshold =0, id = 1:332){
            combined<-data.frame() #creates empty data frame
            output<-data.frame()
            output1<-data.frame()
              for(i in id){
                combined<-rbind(read.csv(directory[i], header=TRUE))
                output<-rbind(output,combined) #will open the CVS files and append the tables together 
                output1<-output[complete.cases(output), ] #??gets rid of NA in files
                sulfate<-output1["sulfate"] # ?? I think this will be a vector that is a subset of output1 that matches the "sulfate" column 
                nitrate<-output1["nitrate"]# ?? I think this will be a vector that is a subset of output1 that matches the "nitrate" column 

                }
             ok<-complete.cases(combined) #counts the number of complete cases
             if (threshold>= ok){ 
               correlation<-cor(data.frame(nitrate,sulfate))
               return(correlation)}
              else {
               print ("nothing!") }
        }
            cr<-corr(spectdata,threshold =150)     
            head(cr)

        **I'm getting:** 
                > cr<-corr(spectdata,threshold =150)     
                Warning message:
                In if (threshold >= ok) { :
                the condition has length > 1 and only the first element will be used
               > head(cr)
                       nitrate    sulfate
            nitrate 1.00000000 0.06243369
            sulfate 0.06243369 1.00000000

    The answer for this particular problem where threshold = 150, should be: 
        source("corr.R")
        source("complete.R")
        cr <- corr("specdata", 150)
        head(cr)
         ## [1] -0.01895754 -0.14051254 -0.04389737 -0.06815956 -0.12350667 -0.07588814
   so it looks like the answer I have is completely wrong ha 
       Please feel free to provide any insight to 1) how to get a correctly sized vector, any other syntax or verbiage that might be helpful 

非常感谢任何和所有帮助〜

我在PC上运行Windows 10中的RStudio。

1 个答案:

答案 0 :(得分:0)

下面请找一个更清晰的代码。我很乐意回答任何问题。

corr <- function(directory, threshold = 0) {
  # set the working directory
  setwd(dir = directory)
  # creates vector of filenames within the directory
  spectdata <- list.files(pattern = ".csv") 
  # for each spectdata, read the sulfate and nitrate columns 
  L1 <- lapply(spectdata, function(x) read.csv(x, header = TRUE)[,c("sulfate","nitrate")])
  # for each csv that was read, removes rows that have NA
  L2 <- lapply(L1, function(x) x[complete.cases(x),])
  # removes csv from list if not greater than or are equal to the threshold
  L3 <- Filter(function(x) nrow(x) >= threshold, L2)
  # if the list still has a csv results after Filter (length of list > 0) then:
  if(length(L3) > 0) {
    # for each csv in list, calculate and save correlation between sulfare and nitrate
    Correlation <- lapply(L3, function(x) cor(x[,"sulfate"], x[,"nitrate"]))
    # change list output to a vector output
    unlist(Correlation) 
  } else {
    # return a zero length vector
    numeric(0)
  }
}

corr(directory = "C:/Users/Evan Friedland/Desktop/DIRECTORY", threshold = 100)

没有评论它看起来更简洁:

corr <- function(directory, threshold = 0) {
  setwd(dir = directory)
  spectdata <- list.files(pattern = ".csv") 
  L1 <- lapply(spectdata, function(x) read.csv(x, header = TRUE)[,c("sulfate","nitrate")])
  L2 <- lapply(L1, function(x) x[complete.cases(x),])
  L3 <- Filter(function(x) nrow(x) >= threshold, L2)
  if(length(L3) > 0) {
    Correlation <- lapply(L3, function(x) cor(x[,"sulfate"], x[,"nitrate"]))
    unlist(Correlation) 
  } else {
    numeric(0)
  }
}