Question

我是编程新手，我正在编写一个函数来浏览工作目录中的数百个csv文件。

文件中包含大量的NA值。

该函数（我称之为corr）有两个参数，即目录和一个阈值（长度为1的数字向量表示完整案例的数量）。

该功能的目的是获取两个硫酸盐和硝酸盐柱（电子表格中的第二和第三列）的完整情况，并在完整情况的数量大于阈值参数时计算它们之间的相关性。

如果函数满足阈值要求（默认阈值为0），则该函数应返回具有相关性的向量。

当我运行代码时，我得到以下两个：

控制台中的A +标志

OR

2.无法找到我在函数中创建的对象。

非常感谢任何帮助。提前谢谢！

corr <- function(directory, threshold=0){
  filelist2<- data.frame(list.files(path=directory, 
                                    pattern=".csv", full.names=TRUE))            

  corvector <- numeric()

  for(i in 1:length(filelist2)){
    data <-data.frame(read.csv(filelist2[i]))
    removedNA<-complete.cases(data)
    newdata<-data[removedNA,2:3] 


    if(nrow(removedNA) > threshold){
      corvector<-c(corvector, cor(data$sulfate, data$nitrate )) 


    }


  }
  corvector   
}

Answer 1

我不认为你的nrow(removedNA)做了你认为它做的事。要复制该示例，我使用mtcars数据集。

data <- mtcars # create dataset
data[2:4, 2] <- NA # create some missings in column 2
data[15:17, 3] <- NA # create some missing in column 3
removedNA <- complete.cases(data)
table(removedNA) # 6 missings indeed
nrow(removedNA) # NULL removedNA is no data.frame, so nrow() doesn't work
newdata <- data[removedNA, 2:3] # this works though
nrow(newdata) # and this shows the rows in 'newdata'
#---- therefore instead of nrow(removedNA) try
if(nrow(data)-nrow(newdata) < threshold) {
    ...
}

注意：我更改了行中>中的<阈值。我想这取决于你是否想要设置绝对最小行数（在这种情况下你可以简单地使用nrow(newdata) > threshold）作为阈值，或者你是否希望阈值反映原始数据中不同的行数和'新'数据。

R studio在我的功能中找不到对象

1 个答案: