我开发了以下用于通过并行计算导入一系列压缩CSV的代码。
我的问题是:
某些ZIP文件(其中包含CSV)已损坏,无法打开。
执行parRapply后,我只能看到last.warning变量错误,因为我知道哪个CSV在每个节点都出现故障,但我看不到所有警告,一次只能看到1个。
所以:
为了显示所有节点中所有警告的列表,我在考虑在代码中使用以下函数:
warnings(DISPOIN_CSV_List <- parRapply(c1, DISPOIN_DIR_REL, parRaplly_Function))
这会有用吗?
此外,如何在应用该功能之前检查是否可以打开CSV,并为这些CSV创建一个空的data.frame。
## -----------------------------------------------------------------------------
## Packages
## -----------------------------------------------------------------------------
# update.packages("RODBC")
# update.packages("tidyverse")
## -----------------------------------------------------------------------------
## Libraries
## -----------------------------------------------------------------------------
suppressMessages(require(RODBC))
suppressMessages(require(tidyverse))
suppressMessages(require(parallel))
## -----------------------------------------------------------------------------
## CMD: Command for DISPOIN's Directory Acquisition
## -----------------------------------------------------------------------------
# shell(cmd = 'pushd "\\srvdiscsv\data" && dir *AL*.zip /b /s > D:\DISPOIN_Data_Directories.csv && popd')
## -----------------------------------------------------------------------------
## RODBC
## -----------------------------------------------------------------------------
## A) MariaDB Connection String
con <- odbcConnect("MariaDB_Tornado24")
invisible(sqlQuery(con, "USE dispoin;"))
# B) Import R Data Directories from MariaDB
DISPOIN_DIR_REL <- as_tibble(sqlFetch(con, "dispoin.t_DISPOIN_DIR_REL"))
odbcClose(con)
# C) Import Zipped CSV data into List of Dataframes, which latter on are compiled as a single dataframe by
# means of rbind
# C.1) parRapply Function Initialization:
parRaplly_Function <- function (DISPOIN_CSV_Row)
{
return(read_csv2(
file = DISPOIN_CSV_Row,
col_names = c(
"SCADA",
"TAG",
"ID_del_AEG",
"Descripcion",
"Time_ON",
"Time_OFF",
"Delta_Time",
"Comentario",
"Es_Alarma",
"Es_Ultima",
"Comentarios"),
col_types = cols(
"SCADA" = "c",
"TAG" = "c",
"ID_del_AEG" = "c",
"Descripcion" = "c",
"Time_ON" = "c",
"Time_OFF" = "c",
"Delta_Time" = "c",
"Comentario" = "c",
"Es_Alarma" = "c",
"Es_Ultima" = "c",
"Comentarios" = "c"),
locale = default_locale(),
na = c("", " "),
quoted_na = TRUE,
quote = "\"",
comment = "",
trim_ws = TRUE,
skip = 0,
n_max = Inf,
guess_max = min(1000, n_max),
progress = FALSE))
}
# C.2) parallel Package: Environment Settings
no_cores <- detectCores()
c1 <- makeCluster(no_cores)
invisible(clusterEvalQ(c1, library(readr)))
setDefaultCluster(c1)
# C.3) parRapply Function Application:
DISPOIN_CSV_List <- parRapply(c1, DISPOIN_DIR_REL, parRaplly_Function)
suppressWarnings(stopCluster(c1))
# D) List's Tibbles Compilation into a single Tibble:
DISPOIN_CSV <- do.call(rbind, DISPOIN_CSV_List)
# E) Write Compiled Table into CSV:
write_csv(
DISPOIN_CSV,
path = file.path("D:/MySQL/R", "DISPOIN_CSV.csv"),
na = "\\N",
append = FALSE,
col_names = TRUE)
# F) Data Cleaning: Environment Variable Removal
rm(list=ls())
我在r-help邮件列表中问了同样的问题,这就是他们给我的答案:
使用tryCatch()。
,而不是
result <- read_csv2(file)
使用
result <- tryCatch(read_csv2(file), error=function(e) makeEmptyDataFrame(conditionMessage(e)))
其中: