我正在使用dplyr
来操纵数据库上的两个表。我想在数据库中将这两个表连接到内存外/不需要在本地下载表数据。换句话说,给定来自数据库源的两个tbl
对象:
> tbl ( db_src, "db_table_1" )
a b c
1 -0.2246894 -1.48167912 -1.65099363
2 0.5559320 -0.87898575 -0.15634590
3 1.8469466 -0.01487524 -0.53098215
4 -0.6875051 0.23880967 0.01824621
5 -0.6735163 0.75485292 0.44154092
> tbl ( db_src, "db_table_2" )
a c
1 0.4287284 -0.3295925
2 0.5201492 0.3341251
3 -2.6355570 1.7916780
4 -1.3645337 1.3642276
5 -0.4954542 -0.6660001
我想得到(在数据库/内存不足):
> tbl ( db_src, "db_table_concatenated" )
a b c
1 -0.2246894 -1.48167912106676 -1.65099363
2 0.5559320 -0.878985746842256 -0.15634590
3 1.8469466 -0.0148752354840942 -0.53098215
4 -0.6875051 0.238809666690982 0.01824621
5 -0.6735163 0.754852923524198 0.44154092
6 0.4287284 NA -0.32959248
7 0.5201492 NA 0.33412510
8 -2.6355570 NA 1.79167801
9 -1.3645337 NA 1.36422764
10 -0.4954542 NA -0.66600006
我当前的实现使用dplyr::rbind_list
,但这需要将数据帧下载到内存中:
# Connecting to database src:
db_src <- src_postgres(dbname = dbname, host = host, port = port, user = user, password = password)
# Load table1 from database src:
db_tbl_1 <- tbl ( db_src, "db_table_1" )
# Load table2 from database src:
db_tbl_2 <- tbl ( db_src, "db_table_2" )
#Data to write to the DB:
db_tbl_concatenated <- rbind_list ( as.data.frame(db_tbl_1, n = -1) , as.data.frame(db_tbl_2, n = -1) ) #In memory / local solution
是否存在内存/数据库内的解决方案?