带有注释的SQL查询从文件导入到R中

时间:2013-12-17 11:54:41

标签: sql r

有几张海报在这里问过类似的问题,这些问题占据了我用80%的方法来读取带有sql查询的文本文件到R中作为RODBC的输入:

Import multiline SQL query to single string

RODBC Temporary Table Issue when connecting to MS SQL Server

但是,我的sql文件中有相当多的注释(作为--comment on this and that)。我的问题是,如何在导入时从查询中删除注释行,或者确保结果字符串保持换行符,从而不会将实际查询附加到注释?

例如,query6.sql:

--query 6
select a6.column1, 
    a6.column2,
    count(a6.column3) as counts
--count the number of occurences in table 1 
from data.table a6
group by a6.column1

变为:

sqlStr <- gsub("\t","", paste(readLines(file('SQL/query6.sql', 'r')), collapse = ' '))
sqlStr 
"--query 6select a6.column1, a6.column2, count(a6.column3) as counts --count the number of occurences in table 1from data.table a6 group by a6.column1"

读入R时。

5 个答案:

答案 0 :(得分:2)

您确定不能直接使用它吗?尽管占用了多行并发表评论,但这仍有效:

> library(sqldf)
> sql <- "select * -- my select statement
+ from BOD
+ "
> sqldf(sql)
  Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8

这也有效:

> sql2 <- c("select * -- my select statement", "from BOD")
> sql2.paste <- paste(sql2, collapse = "\n")
> sqldf(sql2.paste)
  Time demand
1    1    8.3
2    2   10.3
3    3   19.0
4    4   16.0
5    5   15.6
6    7   19.8

答案 1 :(得分:2)

我对另一个答案有困难,所以我修改了罗马的并且做了一点功能。这适用于我的所有测试用例,包括多个注释,单行和部分注释。

read.sql <- function(filename, silent = TRUE) {
    q <- readLines(filename, warn = !silent)
    q <- q[!grepl(pattern = "^\\s*--", x = q)] # remove full-line comments
    q <- sub(pattern = "--.*", replacement="", x = q) # remove midline comments
    q <- paste(q, collapse = " ")
    return(q)
}

答案 2 :(得分:1)

这样的东西?

> cat("--query 6
+ select a6.column1, 
+ a6.column2,
+ count(a6.column3) as counts
+ --count the number of occurences in table 1 
+ from data.table a6
+ group by a6.column1", file = "query6.sql")
> 
> my.q <- readLines("query6.sql")
Warning message:
In readLines("query6.sql") : incomplete final line found on 'query6.sql'
> my.q
[1] "--query 6"                                    "select a6.column1, "                         
[3] "a6.column2,"                                  "count(a6.column3) as counts"                 
[5] "--count the number of occurences in table 1 " "from data.table a6"                          
[7] "group by a6.column1"                         
> find.com <- grepl("--", my.q)
> 
> my.q <- my.q[!find.com]
> paste(my.q, collapse = " ")
[1] "select a6.column1,  a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1"
> 
> unlink("query6.sql")
> rm(list = ls())

答案 3 :(得分:0)

可以使用readChar()代替readLines()。我一直存在混合评论问题(--/* */),这对我来说一直很好。

sql <- readChar(path.to.file, file.size(path.to.file))
query <- sqlQuery(con, sql, stringsAsFactors = TRUE)

答案 4 :(得分:0)

摘要

功能clean_query

  • 删除所有混合评论
  • 创建单个字符串输出
  • 采用SQL路径文本字符串
  • 很简单

功能

require(tidyverse)

# pass in either a text query or path to a sql file
clean_query <- function( text_or_path = '//example/path/to/some_query.sql' ){


  # if sql path, read, otherwise assume text input
  if( str_detect(text_or_path, "(?i)\\.sql$") ){

    text_or_path <- text_or_path %>% read_lines() %>% str_c(sep = " ", collapse = "\n")

  }


  # echo original query to the console 
  #  (unnecessary, but helpful for status if passing sequential queries to a db)
  cat("\nThe query you're processing is: \n", text_or_path, "\n\n")


  # return
  text_or_path %>% 
    # remove all demarked /*  */ sql comments 
    gsub(pattern = '/\\*.*?\\*/', replacement = ' ') %>% 
    # remove all demarked -- comments 
    gsub(pattern = '--[^\r\n]*', replacement = ' ') %>% 
    # remove everything after the query-end semicolon 
    gsub(pattern = ';.*', replacement = ' ') %>% 
    #remove any line break, tab, etc.
    gsub(pattern = '[\r\n\t\f\v]', replacement = ' ') %>%  
    # remove extra whitespace 
    gsub(pattern = ' +', replacement = ' ') 

}

如果您想使用难以理解的长表达式,可以将正则表达式附加在一起,但是我建议您使用可读的代码。



“ query6.sql”的输出

[1] " select a6.column1, a6.column2, count(a6.column3) as counts from data.table a6 group by a6.column1 "



其他文本输入示例

query <- "

    /* this query has 
    intentionally messy 
    comments
    */

    Select 
       COL_A -- with a comment here
      ,COL_B
      ,COL_C
    FROM 
      -- and some helpful comment here
      Database.Datatable
    ;
    -- or wherever

    /* and some more comments here */

"

通话功能:

clean_query(query)

输出:

[1] " Select COL_A ,COL_B ,COL_C FROM Database.Datatable "



如果要测试从.sql文件读取:

temp_path <- path.expand("~/query.sql")

cat(query, file = temp_path)

clean_query(temp_path)

file.remove(temp_path)