连接两个数据框以填充丢失的数据

时间:2020-07-23 21:27:17

标签: r dataframe join dplyr merge

我有两个不同的数据框:

df1 <- tibble(group = c(rep(1, 3), rep(2, 4), rep(1, 3)),
       id = paste0("minutesPrompt", c(1, 2, 3, 1, 2, 3, 4, 1, 2, 3)),
       number = c(rep("a", 3), rep("b", 4), rep("c", 3)),
       minutesPrompt = c(1, 2, 4, 9, 18, 27, 36, 2, 3, 5),
       timestamp = rep("xxxxxx", 10),
       text1 = c("String", rep(NA_character_, 6), rep("String", 3)),
       text2 = c(NA_character_, "String", rep(NA_character_, 5), "String", rep(NA_character_, 2)),
       text3 = c(rep(NA_character_, 2), "String", rep(NA_character_, 7)))

df2 <- tibble(group = rep(2, 7),
              id = paste0("minutesPrompt", c(1, 2, 3, 4, 1, 2, 3)),
              number = c(rep("b", 4), rep("x", 3)),
              minutesPrompt = NA,
              timestamp = rep("xxxxxx", 7),
              text1 = c("String", rep(NA_character_, 6)),
              text2 = c(rep(NA_character_, 2), "String", rep(NA_character_, 4)),
              text3 = c(NA_character_, "String", rep(NA_character_, 5)))
  1. df1(第一张图片)非常大:此数据框包含许多变量,并包含3个不同组的值。对于用id表示的每个参与者,它还有7行。
  2. 相反,
  3. df2(第二张图片)仅由一组显示的变量组成。数据集之间的区别还在于df1缺少一些值(黄色)。应当转移到这些空单元格中的字符串包含在df2(橙色)中。

我的计划是进行完全连接,以便可以用提供的df2值替换df1中“时间戳记”,“ text1”,“ text2”上的缺失信息,直到“ text7”。我已经尝试过了:

full_join(df1, df2) %>%
   group_by("id", "number")

但是,这不能用df2中的字符串替换我丢失的单元格(以黄色突出显示)。

1 个答案:

答案 0 :(得分:1)

我们可以使用std::mutex lock; void f1() { std::lock_guard<mutex> guard1(lock); // some code ... } void f2() { std::lock_guard<mutex> guard2(lock); f1(); // Will deadlock here! How can I make it terminate instead of deadlock? } 软件包:

data.table
library(data.table)

setDT(df1)[setDT(df2), `:=` ( timestamp  = i.timestamp,
                              text1 = i.text1,
                              text2 = i.text2,
                              text3 = i.text3 ), 
            on = .(id, number)][] ## may wanna add `group` column to `on` arguments