我是R初学者,以下是我的代码:
complete <- function(directory, id = 1:332) {
# Read through all the csv data file
for (i in id) {
i <- sprintf("%03d", as.numeric(i))
data <- read.csv(paste(directory, "/", i, ".csv", sep =""))
good <- complete.cases(data) # Eliminating the NA rows
cases <- sum(good == TRUE) # add complete value
}
data.frame(id = id, nobs = cases )
}
当我打印输出
时 id nobs
1 1 402
2 2 402
3 3 402
4 4 402
5 5 402 (incorrect)
如果我只打印案例
[1] 117
[1] 1041
[1] 243
[1] 474
[1] 402
所以正确的输出应该是
id nobs
1 1 117
2 2 1041
3 3 243
4 4 474
5 5 402
我意识到它只取(case)中的最后一个值。
我的问题是如何将(个案)输出存储到矢量中 所以当我调用data.frame函数时,它将返回正确的输出。
感谢
答案 0 :(得分:1)
如果id是一个数字向量,那么这应该可以完成这项工作(因为你没有提供任何可重复的例子,所以未经测试!)
否则你应该在循环中使用for(i in seq_along(id))
和id[i]
。
complete <- function(directory, id = 1:332) {
cases <- NULL
# Read through all the csv data file
for (i in id) {
i <- sprintf("%03d", as.numeric(i))
data <- read.csv(paste(directory, "/", i, ".csv", sep =""))
good <- complete.cases(data) # Eliminating the NA rows
cases[i] <- sum(good == TRUE) # add complete value
}
data.frame(id = id, nobs = cases )
}
答案 1 :(得分:1)
这是一项更有效的任务功能:
complete <- function(directory, id = 1:332) {
filenames <- file.path(directory, paste0(sprintf("%03d", id), ".csv"))
data.frame(id = id,
nobs = sapply(filenames, function(x)
sum(complete.cases(read.csv(x)))))
}
答案 2 :(得分:0)
complete <- function(directory ,id = 1:332){
folder = directory
df_total = data.frame()
for (x in id){
filenames <- sprintf("%03d.csv", x)
filenames <- paste(folder,filenames,sep="\\")
df <- do.call(rbind,lapply(filenames,read.csv, header=TRUE))
my_vector <- sum(complete.cases(enter the column for which you want))
df1 <- data.frame(id=x,nobs=my_vector)
df_total <- rbind(df_total,df1)
}
df_total
}