将sqldf放在r函数中

时间:2017-11-11 18:44:48

标签: r function sqldf

有没有办法将sqldf查询放在用户定义的函数中?我已经完成了这个:http://r.789695.n4.nabble.com/Passing-Multiple-Variable-Into-SQLDF-Statement-as-parameters-of-function-td4636147.htmlR call variable inside sqldf

我的示例代码如下:

db1 = data.frame(a = c(1,2,3), b = c("a","b","c"))
db2 = data.frame(a = c(1,2,3), b = c("b","a","c"))
db = list(db1,db2)

extrct = function(x){
Example=paste0("select * from", x , "where b
='","b", "'")
sqldf(Example,verbose=TRUE) 
}

我有很多数据库,只要sqldf在函数内工作,就可以很容易地编写SAS宏代码来提取数据。另外我已经为一些小进程编写了R代码,但是有许多复杂的SQL程序在sqldf中会更容易。提前谢谢。

2 个答案:

答案 0 :(得分:3)

试试这个:

library(sqldf)

extract <- function(x, envir = parent.frame(), verbose = TRUE, ...) {
  fn$sqldf("select * from [$x] where b = 'b'", envir = envir, verbose = verbose, ...)
}

# sample runs
extract("db1")
extract("db2")

Map(extract, c("db1", "db2"))

db <- setNames(db, c("db1", "db2"))
lapply(names(db), extract, envir = list2env(db))

如果我们将最后一行更改为此,则输出将具有组件名称,但在其他方面相同:

sapply(names(db), extract, envir = list2env(db), simplify = FALSE)

答案 1 :(得分:2)

这就是我写它的方式。 sqldf更自然地处理字符串,而不是使用nse。因此,只需传入要使用源的data.frame的字符串/名称。

library(sqldf); requireNamespace("checkmate")
db1 <- data.frame(a = c(1,2,3), d = c("a","b","c"), stringsAsFactors = F)

extract <- function( table_name, criteria_d ) {
  checkmate::assert_character(table_name, min.chars=1, len=1, any.missing=F)
  checkmate::assert_character(criteria_d, min.chars=1, len=1, any.missing=F)

  # Half-way attempt to prevent sql-injection. Values would need to be only numbers, letters, and underscores.
  checkmate::assert_character(table_name, pattern="^\\w+$", len=1, any.missing=F)
  checkmate::assert_character(criteria_d, pattern="^\\w+$", len=1, any.missing=F)

  sql <- paste0("select * from [", table_name , "] where d ='", criteria_d, "'")

  cat("Executing: `", sql, "`\n", sep="")
  sqldf(sql, verbose=F) 
}

extract("db1", "b")

如果由于某种原因您无法知道变量的字符串/名称,则这相当于:extract(quote(db1), "b")

一些注释。

  1. 我将变量的名称更改为“d”以使事情更加清晰。
  2. 我认为db2db与您的方案无关。
  3. 我尽量不要过多地更改你的代码。如果此功能曾连接到真实数据库,请防止sql injection
  4. 如果您的sql稍微复杂一点,请考虑使用glue::glue_sql()
  5. 编辑以回应@ Sayak的评论

    使用purrr::map_df()循环显示data.frame 名称

    的向量
    c("db1", "db2") %>% 
      purrr::map_df(extract, "b") 
    

    并将结果合并到一个data.frame中:

    Executing: `select * from [db1] where d ='b'`
    Executing: `select * from [db2] where d ='b'`
      a d
    1 2 b
    2 1 b
    

    这很漂亮,不需要后续调用dplyr::bind_rows()

    如果您需要更改条件(,因此它并非总是“b”),请使用purrr::pmap_df()将输入打包为data.frame(其列与参数匹配)您的extract()功能:

    ds_input <- tibble::tribble(
      ~table_name, ~criteria_d,
      "db1",         "b",
      "db1",         "c",
      "db2",         "c"
    )
    
    ds_input %>% 
      purrr::pmap_df(extract)
    
    # Executing: `select * from [db1] where d ='b'`
    # Executing: `select * from [db1] where d ='c'`
    # Executing: `select * from [db2] where d ='c'`
    #   a d
    # 1 2 b
    # 2 3 c
    # 3 3 c