我写了一个简单的函数,该函数接受一个excel文件并进行一些数据清理。但是,我必须对文件夹中包含的所有文件重复此操作。到目前为止,该功能仅适用于单个文件,但是当我遍历列表时,出现如下所列的错误,并且仅输出一些数据帧(应该输出333个表)。该项目的目标是获取所有excel文件,清理数据,将所有DF合并在一起,然后推送到数据库。
library(readxl)
library(tidyr)
library(MESS)
library(stringr)
udFunction <- function(loc) {
test <- read_excel(loc) #read location
test <- test[-c(1:7),] #removes first 7 rows
names(test) <- test[1,] #makes the first row into column name
test <- test[-1,] #removes first row since it's copied to column name
#Rename all column names
names(test)[1] = "Time"
names(test)[2] = "Sample"
names(test)[3] = "Rename"
names(test)[4] = "Test"
names(test)[5] = "Test2"
names(test)[6] = "Test3"
names(test)[7] = "Test4"
names(test)[8] = "Test5"
names(test)[9] = "Test6"
names(test)[10] = "Test7"
names(test)[11] = "Test8"
names(test)[12] = "Test9"
names(test)[13] = "Test10"
names(test)[14] = "Test11"
names(test)[15] = "Test12"
names(test)[16] = "Test13"
names(test)[17] = "Test14"
names(test)[18] = "Test15"
names(test)[19] = "Test16"
#Copy Time column to NewColumn
test %<>%
mutate(NewColumn = Time) %>%
mutate(Date = str_extract(loc, "\\d{6}")) #loc is from the path name
test$NewColumn <- str_replace(test$NewColumn, "\\d", NA_character_) %>%
filldown() #replaces any string without digits with NA
test %<>%
filter(!str_detect(test$Time, "[A-Za-z]")) #filters anything with characters
}
loc <- "C:/PATH.../.../2019"
files = list.files(path = loc, pattern = ".xls$", full.names = TRUE) #files is a list of 333 path names.
for (i in files) {
cast = paste("CC", i, sep = "_")
try(assign(cast, udFunction(i)))
}
我希望所有文件的数据帧都能输出,但是,我一直得到相同的Error in attr(x, "names") <- as.character(value) :
'names' attribute [1] must be the same length as the vector [0]
。
我假设它与重命名列名有关。当我一一尝试时所有这些工作再次起作用,但是当我尝试创建一个函数然后遍历路径名列表时,我得到了上面的错误。
答案 0 :(得分:0)
这是一个建议:
udFunction <- function(loc) {
# read your excel file, skip the column names as you don't need them, and skip the first 8 rows
test <- read_excel(loc, col_names = F, skip = 8)
# rename your columns
colnames(test) <- c("Time", "Sample", "Rename", "Test", paste0("Test", c(2:16)))
#Copy Time column to NewColumn
test %<>%
mutate(NewColumn = Time) %>%
mutate(Date = str_extract(loc, "\\d{6}")) #loc is from the path name
test$NewColumn <- str_replace(test$NewColumn, "\\d", NA_character_) %>%
filldown() #replaces any string without digits with NA
test %<>%
filter(!str_detect(test$Time, "[A-Za-z]")) #filters anything with characters
return(test)
}
# list your xls files
files <- list.files(path = "C:/PATH.../.../2019", pattern = ".xls", full.names = TRUE)
# use lapply to iterate the function over your files. This returns a list of data frames
lapply(files, udFunction)