我已经创建了一个函数,它从文件列表中读取输入并使用rbind附加它。
dat <- NA
file.names <- list.files(paste(in.path2,"CSV",sep =""))
for(f in file.names){
file <- paste(in.path2,"CSV/", f, sep = "")
tmp <- read.csv(file, stringsAsFactors = F, na.strings = c("", " "))
if (is.na(dat)) {
dat <- tmp
} else {
colnames(tmp) <- colnames(dat)
dat <- rbind(dat, tmp)
}
print(f)
}
我收到了这个警告:
1: In if (is.na(dat)) { ... :
the condition has length > 1 and only the first element will be used.
如何纠正?
答案 0 :(得分:1)
我强烈建议不要像这样增长你的数据框架
file.names <- list.files(paste(in.path2,"CSV",sep =""))
input_list <- list()
for(f in file.names){
file <- paste(in.path2,"CSV/", f, sep = "")
input_list[[f]] <- read.csv(file, stringsAsFactors = F, na.strings = c("", " "))
print(f)
}
dat <- do.call(rbind, input_list)
这要快得多,你不需要测试dat is.na或不是
答案 1 :(得分:1)
我们可以使用lapply
更轻松地执行此操作,而无需担心NA
和if/else
条款的分配
filenames <- list.files(paste0(in.path2,"CSV"), full.names = TRUE)
do.call(rbind,lapply(filenames, read.csv, na.strings = c("", " "), stringsAsFactors = FALSE))
或其他选项fread
来自data.table
library(data.table)
rbindlist(lapply(filenames, fread, na.strings = c("", " ")), fill = TRUE)
或tidyverse
library(tidyverse)
map_df(filenames, read_csv, na = c("", " "))
如果列不相同,则
map(filenames, read_csv, na = c("", " ")) %>%
bind_rows