Question

我正在使用tidyverse连接到具有相同数据结构（集群）的多个数据库。由于数据库来源不同，如果没有本地副本，则无法合并。

我可以使用长编码来完成所有工作，但是现在我尝试缩短代码，以解决以下问题。在为select语句定义列名时，dbplyr会将其与循环变量一起存储到连接中，而不是评估并存储字符串的结果。

这是一个最小的可重现示例：

library(tidyverse)

#reproducable example with two database and two tables in memory
con1 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
con2 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:")
copy_to(con1, mtcars)
copy_to(con1, iris)
copy_to(con2, mtcars)
copy_to(con2, iris)

#names of the tables
tables<-c("mtcars", "iris")

#specify which columns to select from which table
columns<-list("mtcars"=c("mpg", "hp"), 
              "iris"=c("Sepal.Length", "Sepal.Width"))

#list to put 
data_list<-vector(mode="list", length=length(tables))
names(data_list)<-tables

#loop over tables
for(i in tables){
  #loop over databases
  for(j in 1:2)
    data_list[[i]][[j]]<-tbl(get(paste0("con",j)), i)%>%select(columns[[i]])
}

到目前为止，此代码可以正常工作，但问题在于访问存储在列表（data_list）中的数据。

如果我尝试

data_list[[1]][[1]]

R仍会评估

select(columns[[i]])

循环后，发出“ iris”并且该语句给出错误消息：

错误：未知的列Sepal.Length和Sepal.Width

对于data_list中的第二个列表，它工作得很好，因为我设置得当：

data_list[[2]][[1]]

如何强制select语句对表达式求值而不将其与循环变量I一起存储？

在下一步中，我也想添加一个过滤器表达式，这样我就不必收集所有数据，而只收集所需的数据。

如果联合会处理没有副本的数据库，那将解决所有问题

致谢和最诚挚的问候托马斯

Answer 1

嗯，您是说要在查询数据库后后以交互方式选择列？

我编辑了您的代码以使用library(tidyverse) # Reproducable example with two database and two tables in memory con1 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") con2 <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") copy_to(con1, mtcars) copy_to(con1, iris) copy_to(con2, mtcars) copy_to(con2, iris) # Specify which columns to select from which table columns <-list("mtcars" = c("mpg", "hp"), "iris" = c("Sepal.Length", "Sepal.Width")) # Loop over the table names (mtcars, iris) **and** the columns that belong to those datasets data_list <- map2(names(columns), columns, ~ { # For each table/column combination, grab them from con1 and con2 and return them in a list con1_db <- tbl(con1, .x) %>% select(.y) con2_db <- tbl(con2, .x) %>% select(.y) list(con1_db, con2_db) }) %>% setNames(names(columns)) # With this you can interactively select the columns you wanted for each data. Just replace the dataset that you're interested in. data_list %>% pluck("iris") %>% map(select, columns[['iris']]) #> [[1]] #> Warning: `overscope_eval_next()` is deprecated as of rlang 0.2.0. #> Please use `eval_tidy()` with a data mask instead. #> This warning is displayed once per session. #> Warning: `overscope_clean()` is deprecated as of rlang 0.2.0. #> This warning is displayed once per session. #> # Source: lazy query [?? x 2] #> # Database: sqlite 3.30.1 [:memory:] #> Sepal.Length Sepal.Width #> <dbl> <dbl> #> 1 5.1 3.5 #> 2 4.9 3 #> 3 4.7 3.2 #> 4 4.6 3.1 #> 5 5 3.6 #> 6 5.4 3.9 #> 7 4.6 3.4 #> 8 5 3.4 #> 9 4.4 2.9 #> 10 4.9 3.1 #> # … with more rows #> #> [[2]] #> # Source: lazy query [?? x 2] #> # Database: sqlite 3.30.1 [:memory:] #> Sepal.Length Sepal.Width #> <dbl> <dbl> #> 1 5.1 3.5 #> 2 4.9 3 #> 3 4.7 3.2 #> 4 4.6 3.1 #> 5 5 3.6 #> 6 5.4 3.9 #> 7 4.6 3.4 #> 8 5 3.4 #> 9 4.4 2.9 #> 10 4.9 3.1 #> # … with more rows函数（因为您已经加载了）。

display: none

如何使用tidyverse处理R中的多个数据库连接和选择语句

1 个答案: