使用dplyr将两个数据库表连接到内存中

时间:2014-11-04 14:38:28

标签: r concatenation dplyr rbind

我正在使用dplyr来操纵数据库上的两个表。我想在数据库中将这两个表连接到内存外/不需要在本地下载表数据。换句话说,给定来自数据库源的两个tbl对象:

> tbl ( db_src, "db_table_1" )
           a           b           c
1 -0.2246894 -1.48167912 -1.65099363
2  0.5559320 -0.87898575 -0.15634590
3  1.8469466 -0.01487524 -0.53098215
4 -0.6875051  0.23880967  0.01824621
5 -0.6735163  0.75485292  0.44154092


> tbl ( db_src, "db_table_2" )
           a          c
1  0.4287284 -0.3295925
2  0.5201492  0.3341251
3 -2.6355570  1.7916780
4 -1.3645337  1.3642276
5 -0.4954542 -0.6660001

我想得到(在数据库/内存不足):

> tbl ( db_src, "db_table_concatenated" )
           a                   b           c
1  -0.2246894   -1.48167912106676 -1.65099363
2   0.5559320  -0.878985746842256 -0.15634590
3   1.8469466 -0.0148752354840942 -0.53098215
4  -0.6875051   0.238809666690982  0.01824621
5  -0.6735163   0.754852923524198  0.44154092
6   0.4287284                  NA -0.32959248
7   0.5201492                  NA  0.33412510
8  -2.6355570                  NA  1.79167801
9  -1.3645337                  NA  1.36422764
10 -0.4954542                  NA -0.66600006

我当前的实现使用dplyr::rbind_list,但这需要将数据帧下载到内存中:

# Connecting to database src:
db_src <- src_postgres(dbname = dbname, host = host, port = port, user = user,  password = password)

# Load table1 from  database src:
db_tbl_1 <- tbl ( db_src, "db_table_1" )

# Load table2 from  database src:
db_tbl_2 <- tbl ( db_src, "db_table_2" )

#Data to write to the DB: 
db_tbl_concatenated <- rbind_list ( as.data.frame(db_tbl_1, n = -1) , as.data.frame(db_tbl_2, n = -1) ) #In memory / local solution

是否存在内存/数据库内的解决方案?

0 个答案:

没有答案