我正在编写一个R函数,它读取一个充满332 .csv文件的目录,并报告每个数据文件中完全观察到的案例数。该函数返回一个数据框,其中第一列是文件的名称,第二列是完整案例的数量。例如:
ID OBS
1 233
2 149
etc.
这是我写的代码:
complete <- function(directory, id = 1:332) {
files_full <- list.files(directory, full.names = TRUE)
nobs <- sum(complete.cases(files_full[id]))
data <- data.frame(id, nobs)
return(data)
}
这里的问题是,当函数运行时,它为我的列中的每个“nobs”提供了值1。
答案 0 :(得分:4)
有点不同的方法:
complete <- function(directory, pattern = "csv$") {
setNames(as.data.frame(do.call(
rbind,
lapply(
list.files(directory, pattern = pattern, full.names=TRUE),
function(fname) list(fname, sum(complete.cases(read.csv(fname))))
)
)), c("file", "complete"))
}
如果您想将id
作为参数:
complete <- function(directory, id = 1:332) {
count_complete <- function(fname) sum(complete.cases(read.csv(fname)))
fnames <- list.files(directory, full.names=TRUE)[id]
data.frame(id = id, complete = unlist(lapply(fnames, count_complete)))
}
答案 1 :(得分:3)
sum(complete.cases(files_full[i]))
没有多大意义,可能是你出错了。
我会这样做,
1-定义一个处理单个数据集的函数,
read_and_summarise <- function(f, ...) {d <- read.csv(f, ...) ; sum(complete.cases(d))}
2-将此功能应用于所有文件,
lf <- list.files(directory, full.names = TRUE)
vapply(lf, read_and_summarise, 0L)
(未测试的)
答案 2 :(得分:3)
让我们了解您的代码实际执行的操作:
complete <- function(directory, id = 1:332) {
# list files
files_full <- list.files(directory, full.names = TRUE)
# create an empty placeholder, to grow sequentially. Known in some circles as R Inferno
# http://www.burns-stat.com/documents/books/the-r-inferno/
dat <- data.frame()
for (i in id) { # select filenames based on their position in the list
# (prone to errors, because it depends on the order)
dat <- rbind(dat, read.csv(files_full[i])) # read the data, and append it
# to previous data.frame. Why??
nobs <- sum(complete.cases(files_full[i])) # number of complete cases...
# in a character vector of length 1
data <- data.frame(id, nobs) # this gets overwritten every time
}
data
}
以下是您可能想写的内容:
complete <- function(directory, id = 1:332) {
# list files
files_full <- list.files(directory, full.names = TRUE)
files_toread <- files_full[id] # filter out unwanted files (tip: ?grep is better)
output <- data.frame(id = id, nobs = 0)
for (i in id) {
tmp <- read.csv(files_toread[i]) # read the data
nobs <- sum(complete.cases(tmp)) # number of complete cases
output[i, "nobs"] <- nobs
}
output
}
答案 3 :(得分:1)
这是我的解决方案,似乎更容易阅读:
complete <- function(directory,id=1:332){
filenames <- sprintf("%03d.csv", id)
filePaths <- paste(directory, filenames, sep="/")
nFiles=length(id)
output <- matrix(ncol=2, nrow=nFiles)
for(i in 1:nFiles){
output[i,]= c(id[i],sum(complete.cases(read.csv(filePaths[i]))))
}
output <- setNames(data.frame(output),c("id","nobs"))
output
}
希望这有助于某人。
答案 4 :(得分:0)
我认为这更简单易懂:
complete <- function(dir, id = 1:332){
dir <- list.files(dir, full.names = T)
count <- data.frame()
for(i in id){
ok <- sum(complete.cases(read.csv(dir[i])))
count <- rbind(count, ok)
}
count_table <- cbind(id, count)
colnames(count_table) <- c("id", "nobs")
count_table
}