R条件连接

时间:2018-07-05 11:08:47

标签: r join conditional tibble

有没有一种方法可以在R中联接和更新列?示例:

tbl1 <- tibble(ID = LETTERS[1:3],
       VAL = rep(NA, 3),
       tbl1_df = list(tibble(A = rnorm(3),
                             B = rnorm(3))))

tbl2 <- tibble(ID = LETTERS[1:3],
               VAL = c(1, 2, 3),
               tbl2_df = list(tibble(A = rnorm(3),
                                     B = rnorm(3))))

tbl3 <- tibble(ID = LETTERS[1:3],
               VAL = c(1, 2, 3),
               tbl3_df = list(tibble(A = rnorm(3),
                                     B = rnorm(3))))

我想将这些小节连接在一起,并使用具有值的表之一更新VAL。表在VAL中始终具有相同的值,但我并不总是知道它们在哪个表中。是否可以将VAL列强制在一起或将VAL列与值所在的小标题之一保持距离?

答案应该看起来像这样,如上所述,表VAL列来自哪个表,表具有相同的VAL或NA都没有关系。

tibble(ID = LETTERS[1:3],
                 VAL = c(1, 2, 3),
                 tbl1_df = list(tibble(A = rnorm(3),
                                       B = rnorm(3))),
                 tbl2_df = list(tibble(A = rnorm(3),
                                       B = rnorm(3))),
                 tbl3_df = list(tibble(A = rnorm(3),
                                       B = rnorm(3))))

# A tibble: 3 x 5
  ID      VAL tbl1_df          tbl2_df          tbl3_df         
  <chr> <dbl> <list>           <list>           <list>          
1 A        1. <tibble [3 x 2]> <tibble [3 x 2]> <tibble [3 x 2]>
2 B        2. <tibble [3 x 2]> <tibble [3 x 2]> <tibble [3 x 2]>
3 C        3. <tibble [3 x 2]> <tibble [3 x 2]> <tibble [3 x 2]>

2 个答案:

答案 0 :(得分:0)

怎么样?

library(purrr)

list(tbl1, tbl2, tbl3) %>% 
  reduce(full_join, by = "ID") %>%   #merge all tables
  select_if(~!all(is.na(.))) %>%     #drop columns having all NA value
  select(-starts_with("VAL."))       #keep only one 'VAL' column and drop remaining repetitive columns

给出

# A tibble: 3 x 5
  ID    tbl1_df          tbl2_df            VAL tbl3_df         
  <chr> <list>           <list>           <dbl> <list>          
1 A     <tibble [3 x 2]> <tibble [3 x 2]>  1.00 <tibble [3 x 2]>
2 B     <tibble [3 x 2]> <tibble [3 x 2]>  2.00 <tibble [3 x 2]>
3 C     <tibble [3 x 2]> <tibble [3 x 2]>  3.00 <tibble [3 x 2]>

答案 1 :(得分:0)

基于Jaap的注释,可以通过使用purrr中的reduce命令和dplyr中的full_join将这些小标题合并为一个小标题。 然后的问题是如何只获取存在的VAL,而不是为VAL设置3列,但并非所有列都有数据。一种简单的方法是使用dplyr的coalesce命令,该命令采用第一个非缺失值。此步骤中引入的问题是,如果数据类型均为NA,则它们均为BOOLEAN,因此可以使用as.numeric来解决。最后,删除后面带有字母的其他VAL列。

library(dplyr)
library(purrr)

reduce(list(tbl1, tbl2, tbl3), full_join, by = "ID") %>% # Combine the tibbles into a single tibble
  mutate(VAL= coalesce(as.numeric(VAL.x), as.numeric(VAL.y), as.numeric(VAL))) %>% # Create a variable for VAL which takes the first non missing using the coalesce function
  select(-starts_with("Val.")) # Delete the columns for VAL which were created when joining and have a name of VAL. and then a letter