我使用RMySQL和MySQL数据库来存储我的数据集。有时数据会被修改,或者我也会将结果存储回数据库。简而言之,在我的用例中,R和数据库之间存在相当多的交互。
大多数时候,我使用dbWriteTable
和dbReadTable
等便利功能来编写和读取我的数据。不幸的是,这些只是完全忽略了R数据类型和MySQL字段类型。我的意思是我希望MySQL日期字段最终在Date
或POSIX
类中。另一种方法我认为这些R类存储为一个有点对应的MySQL字段类型。这意味着日期不应该是字符 - 我不希望在这里区分浮动和双打......
我也尝试使用dbGetQuery
- 同样的结果。在阅读本手册时是否有一些我完全错过的内容,或者在这些软件包中根本不可能(还)?一个不错的工作会怎样?
整数列通常作为R整数向量导入,但BIGINT或UNSIGNED INTEGER等情况除外,它们被强制转换为R的双精度向量以避免截断(当前R的整数是带符号的32位数)。
时间变量作为字符数据导入/导出,因此您需要将这些变量转换为您喜欢的日期/时间表示。
答案 0 :(得分:5)
好的,我现在有了一个有效的解决方案。这是一个将MySQL字段类型映射到R类的函数。 这有助于特别处理MySQL字段类型日期...
dbReadMap <- function(con,table){
statement <- paste("DESCRIBE ",table,sep="")
desc <- dbGetQuery(con=con,statement)[,1:2]
# strip row_names if exists because it's an attribute and not real column
# otherweise it causes problems with the row count if the table has a row_names col
if(length(grep(pattern="row_names",x=desc)) != 0){
x <- grep(pattern="row_names",x=desc)
desc <- desc[-x,]
}
# replace length output in brackets that is returned by describe
desc[,2] <- gsub("[^a-z]","",desc[,2])
# building a dictionary
fieldtypes <- c("int","tinyint","bigint","float","double","date","character","varchar","text")
rclasses <- c("as.numeric","as.numeric","as.numeric","as.numeric","as.numeric","as.Date","as.character","as.character","as.character")
fieldtype_to_rclass = cbind(fieldtypes,rclasses)
map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
map$rclasses <- as.character(map$rclasses)
#get data
res <- dbReadTable(con=con,table)
i=1
for(i in 1:length(map$rclasses)) {
cvn <- call(map$rclasses[i],res[,map$Field[i]])
res[map$Field[i]] <- eval(cvn)
}
return(res)
}
也许这不是一个好的编程习惯 - 我只是不知道更好。因此,使用它需要您自担风险或帮助我改进它...当然它只有一半:reading
。希望我能尽快找一些时间写一个写作功能。
如果您对映射字典有任何建议,请告诉我:)
答案 1 :(得分:1)
以下是@ Matt Bannert
函数的更通用函数,它适用于查询而不是表:
# Extension to dbGetQuery2 that understands MySQL data types
dbGetQuery2 <- function(con,query){
statement <- paste0("CREATE TEMPORARY TABLE `temp` ", query)
dbSendQuery(con, statement)
desc <- dbGetQuery(con, "DESCRIBE `temp`")[,1:2]
dbSendQuery(con, "DROP TABLE `temp`")
# strip row_names if exists because it's an attribute and not real column
# otherweise it causes problems with the row count if the table has a row_names col
if(length(grep(pattern="row_names",x=desc)) != 0){
x <- grep(pattern="row_names",x=desc)
desc <- desc[-x,]
}
# replace length output in brackets that is returned by describe
desc[,2] <- gsub("[^a-z]","",desc[,2])
# building a dictionary
fieldtypes <- c("int", "tinyint", "bigint", "float", "double", "date", "character", "varchar", "text")
rclasses <- c("as.numeric", "as.numeric", "as.numeric", "as.numeric", "as.numeric", "as.Date", "as.character", "as.factor", "as.character")
fieldtype_to_rclass = cbind(fieldtypes,rclasses)
map <- merge(fieldtype_to_rclass,desc,by.x="fieldtypes",by.y="Type")
map$rclasses <- as.character(map$rclasses)
#get data
res <- dbGetQuery(con,query)
i=1
for(i in 1:length(map$rclasses)) {
cvn <- call(map$rclasses[i],res[,map$Field[i]])
res[map$Field[i]] <- eval(cvn)
}
return(res)
}