有几张海报在这里问过类似的问题,这些问题占据了我用80%的方法来读取带有sql查询的文本文件到R中作为RODBC的输入:
Import multiline SQL query to single string
RODBC Temporary Table Issue when connecting to MS SQL Server
但是,我的sql文件中有相当多的注释(作为--comment on this and that)。我的问题是,如何在导入时从查询中删除注释行,或者确保结果字符串保持换行符,从而不会将实际查询附加到注释?
例如,query6.sql:
--query 6
select a6.column1,
a6.column2,
count(a6.column3) as counts
--count the number of occurences in table 1
from data.table a6
group by a6.column1
变为:
sqlStr <- gsub("\t","", paste(readLines(file('SQL/query6.sql', 'r')), collapse = ' '))
sqlStr
"--query 6select a6.column1, a6.column2, count(a6.column3) as counts --count the number of occurences in table 1from data.table a6 group by a6.column1"
读入R时。
答案 0 :(得分:2)
您确定不能直接使用它吗?尽管占用了多行并发表评论,但这仍有效:
> library(sqldf)
> sql <- "select * -- my select statement
+ from BOD
+ "
> sqldf(sql)
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
这也有效:
> sql2 <- c("select * -- my select statement", "from BOD")
> sql2.paste <- paste(sql2, collapse = "\n")
> sqldf(sql2.paste)
Time demand
1 1 8.3
2 2 10.3
3 3 19.0
4 4 16.0
5 5 15.6
6 7 19.8
答案 1 :(得分:2)
我对另一个答案有困难,所以我修改了罗马的并且做了一点功能。这适用于我的所有测试用例,包括多个注释,单行和部分注释。
read.sql <- function(filename, silent = TRUE) {
q <- readLines(filename, warn = !silent)
q <- q[!grepl(pattern = "^\\s*--", x = q)] # remove full-line comments
q <- sub(pattern = "--.*", replacement="", x = q) # remove midline comments
q <- paste(q, collapse = " ")
return(q)
}
答案 2 :(得分:1)
这样的东西?
> cat("--query 6
+ select a6.column1,
+ a6.column2,
+ count(a6.column3) as counts
+ --count the number of occurences in table 1
+ from data.table a6
+ group by a6.column1", file = "query6.sql")
>
> my.q <- readLines("query6.sql")
Warning message:
In readLines("query6.sql") : incomplete final line found on 'query6.sql'
> my.q
[1] "--query 6" "select a6.column1, "
[3] "a6.column2," "count(a6.column3) as counts"
[5] "--count the number of occurences in table 1 " "from data.table a6"
[7] "group by a6.column1"
> find.com <- grepl("--", my.q)
>
> my.q <- my.q[!find.com]
> paste(my.q, collapse = " ")
[1] "select a6.column1, a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1"
>
> unlink("query6.sql")
> rm(list = ls())
答案 3 :(得分:0)
可以使用readChar()
代替readLines()
。我一直存在混合评论问题(--
或/* */
),这对我来说一直很好。
sql <- readChar(path.to.file, file.size(path.to.file))
query <- sqlQuery(con, sql, stringsAsFactors = TRUE)
答案 4 :(得分:0)
功能clean_query
:
require(tidyverse)
# pass in either a text query or path to a sql file
clean_query <- function( text_or_path = '//example/path/to/some_query.sql' ){
# if sql path, read, otherwise assume text input
if( str_detect(text_or_path, "(?i)\\.sql$") ){
text_or_path <- text_or_path %>% read_lines() %>% str_c(sep = " ", collapse = "\n")
}
# echo original query to the console
# (unnecessary, but helpful for status if passing sequential queries to a db)
cat("\nThe query you're processing is: \n", text_or_path, "\n\n")
# return
text_or_path %>%
# remove all demarked /* */ sql comments
gsub(pattern = '/\\*.*?\\*/', replacement = ' ') %>%
# remove all demarked -- comments
gsub(pattern = '--[^\r\n]*', replacement = ' ') %>%
# remove everything after the query-end semicolon
gsub(pattern = ';.*', replacement = ' ') %>%
#remove any line break, tab, etc.
gsub(pattern = '[\r\n\t\f\v]', replacement = ' ') %>%
# remove extra whitespace
gsub(pattern = ' +', replacement = ' ')
}
如果您想使用难以理解的长表达式,可以将正则表达式附加在一起,但是我建议您使用可读的代码。
[1] " select a6.column1, a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1 "
query <- "
/* this query has
intentionally messy
comments
*/
Select
COL_A -- with a comment here
,COL_B
,COL_C
FROM
-- and some helpful comment here
Database.Datatable
;
-- or wherever
/* and some more comments here */
"
通话功能:
clean_query(query)
输出:
[1] " Select COL_A ,COL_B ,COL_C FROM Database.Datatable "
如果要测试从.sql文件读取:
temp_path <- path.expand("~/query.sql")
cat(query, file = temp_path)
clean_query(temp_path)
file.remove(temp_path)