将data.table的值替换为另一个data.table中的值

时间:2017-05-10 21:58:24

标签: r data.table

我有一个包含300列和1000行的数据集以及data.table格式的相应代码簿。为简单起见,我将为两者提供3列。

dt <- data.table(id = 1:10,
                 a  = sample(c(1,2,3),10, replace = T),
                 b  = sample(c(1,2)  ,10, replace = T),
                 c  = sample(c(1:5)  ,10, replace = T))

    id a b c
 1:  1 2 1 2
 2:  2 2 1 1
 3:  3 3 1 1
 4:  4 3 1 1
 5:  5 1 2 5
 6:  6 2 1 3
 7:  7 1 2 3
 8:  8 1 1 2
 9:  9 2 1 5
10: 10 3 2 4

cb <- data.table(var = c(rep("a", 3), rep("b", 2), rep("c", 5)),
                 val = c(1,2,3,1,2,1,2,3,4,5),
                 des = c("red", "blue", "yellow", "yes","no","K", "Na","Ag","Au","Si"))

    var val    des
 1:   a   1    red
 2:   a   2   blue
 3:   a   3 yellow
 4:   b   1    yes
 5:   b   2     no
 6:   c   1      K
 7:   c   2     Na
 8:   c   3     Ag
 9:   c   4     Au
10:   c   5     Si

cb中,vardt中的相应变量,valdt中具有相应des的值值。我想通过将dt中的值替换为dt中的值来修改cb。它应该看起来像

    id      a   b  c
 1:  1    red yes Na
 2:  2 yellow  no Ag
 3:  3   blue yes Ag
 4:  4    red yes Au
 5:  5   blue yes Ag
 6:  6   blue  no Au
 7:  7 yellow yes Si
 8:  8   blue  no Ag
 9:  9    red  no  K
10: 10 yellow  no Ag

如何有效地执行这样的操作,并且听起来不像我的计算机内置活塞?

原因是我有一个预先编写的代码来分析数据并需要实际值才能运行它。它也可能在一般情况下有用,因为很多时候我都会获得数据和代码簿,但通常它们不是很多变量。

3 个答案:

答案 0 :(得分:3)

你可以尝试

dcast(melt(dt, 1, var="var", val="val")[cb, on=c("var","val")], id~var, value.var="des")
#     id      a   b  c
#  1:  1    red yes  K
#  2:  2 yellow  no Si
#  3:  3    red yes Si
#  4:  4    red  no Au
#  5:  5    red  no Ag
#  6:  6   blue yes  K
#  7:  7   blue  no Si
#  8:  8 yellow yes Na
#  9:  9   blue yes Ag
# 10: 10 yellow yes Si

答案 1 :(得分:3)

另一种选择是进行多次合并+更新:

cb_dc <- data.table::dcast(cb, des~var, value.var = "val")
cols = c("a","b","c")
dt[, (cols) := lapply(cols, function(x) cb_dc[dt, des, on = x]) ]

 #  id      a   b  c
 #1:  1    red yes Si
 #2:  2   blue yes Na
 #3:  3   blue  no Au
 #4:  4 yellow yes  K
 #5:  5    red  no Na
 #6:  6 yellow yes Na
 #7:  7 yellow  no  K
 #8:  8   blue  no Na
 #9:  9   blue yes Si
#10: 10    red  no Na

数据:

set.seed(1)
  dt <- data.table(id = 1:10,
                   a  = sample(c(1,2,3),10, replace = T),
                   b  = sample(c(1,2)  ,10, replace = T),
                   c  = sample(c(1:5)  ,10, replace = T))

答案 2 :(得分:1)

这个dplyr答案essentialy连接一个子表一次为三列。

library(dplyr)

dt %>% 
  left_join(cb %>% filter(var == "a"), by=c("a" = "val")) %>% 
  left_join(cb %>% filter(var == "b"), by=c("b" = "val")) %>% 
  left_join(cb %>% filter(var == "c"), by=c("c" = "val")) %>%
  select(id, des.x, des.y, des) %>%
  rename(a = des.x, b = des.y, c = des)