这种加入/合并的“数据表”方式是什么?

时间:2015-08-31 17:32:13

标签: r join merge left-join data.table

我有一个像这样的“字典”表:

dict <- data.table(
  Nickname = c("Abby", "Ben", "Chris", "Dan", "Ed"),
  Name = c("Abigail", "Benjamin", "Christopher", "Daniel", "Edward")
)
dict
#    Nickname        Name
# 1:     Abby     Abigail
# 2:      Ben    Benjamin
# 3:    Chris Christopher
# 4:      Dan      Daniel
# 5:       Ed      Edward

这样的“数据”表:

dat <- data.table(
  Friend1 = c("Abby", "Ben", "Ben", "Chris"),
  Friend2 = c("Ben", "Ed", NA, "Ed"),
  Friend3 = c("Ed", NA, NA, "Dan"),
  Friend4 = c("Dan", NA, NA, NA)
)
dat
#    Friend1 Friend2 Friend3 Friend4
# 1:    Abby     Ben      Ed     Dan
# 2:     Ben      Ed      NA      NA
# 3:     Ben      NA      NA      NA
# 4:   Chris      Ed     Dan      NA

我想制作一个看起来像这样的data.table

result <- data.table(
  Friend1.Nickname = c("Abby", "Ben", "Ben", "Chris"),
  Friend1.Name = c("Abigail", "Benjamin", "Benjamin", "Christopher"),
  Friend2.Nickname = c("Ben", "Ed", NA, "Ed"),
  Friend2.Name = c("Benjamin", "Edward", NA, "Edward"),
  Friend3.Nickname = c("Ed", NA, NA, "Dan"),
  Friend3.Name = c("Edward", NA, NA, "Daniel"),
  Friend4.Nickname = c("Dan", NA, NA, NA),
  Friend4.Name = c("Daniel", NA, NA, NA)
)
result
# sorry, word wrapping makes this too annoying to copy

这是我想到的解决方案:

friend_vars <- paste0("Friend", 1:4)
friend_nicks <- paste0(friend_vars, ".Nickname")
friend_names <- paste0(friend_vars, ".Name")
setnames(dat, friend_vars, friend_nicks)
for (i in 1:4) {
  dat[, friend_names[i] := dict$Name[match(dat[[friend_nicks[i]]], dict$Nickname)], with = FALSE]
}

是否有更多的“数据表式”方式来做到这一点?我确信这是好的和有效的,但是阅读起来很难看,并且部分来自data.table的就地分配我觉得我没有充分利用该软件包提供的功能。< / p>

我也不是一个非常强大的SQL用户,我对连接术语不太满意。我觉得Data.table - left outer join on multiple tables在这里很有用,但我不确定如何将它应用到我的情况中。

4 个答案:

答案 0 :(得分:6)

使用data.table 1.9.5

for (nm in names(dat)) {
    on = setattr("Nickname", 'names', nm)
    dat[dict, paste0(nm, ".Name") := i.Name, on=on]
}

我们可以使用on=加入,而不是设置密钥。现在,您可以使用setcolorder()重新排序名称。

除非绝对必要,否则我会避免重塑数据。这是更新的地方,而加入非常方便。现在有on=论证,我无法抗拒发布答案: - )。

答案 1 :(得分:2)

我没有找到与您的result完全匹配的解决方案,但您可能能够使用这样的工作:

dat[, id := .I]
dat.m <- melt(dat, id.vars='id', variable.name='Friend', value.name='Nickname')
setkey(dict, Nickname)
dat.m[, Name := dict[Nickname, Name]]
> dat.m
    id  Friend Nickname        Name
 1:  1 Friend1     Abby     Abigail
 2:  2 Friend1      Ben    Benjamin
 3:  3 Friend1      Ben    Benjamin
 4:  4 Friend1    Chris Christopher
 5:  1 Friend2      Ben    Benjamin
 6:  2 Friend2       Ed      Edward
 7:  3 Friend2       NA          NA
 8:  4 Friend2       Ed      Edward
 9:  1 Friend3       Ed      Edward
10:  2 Friend3       NA          NA
11:  3 Friend3       NA          NA
12:  4 Friend3      Dan      Daniel
13:  1 Friend4      Dan      Daniel
14:  2 Friend4       NA          NA
15:  3 Friend4       NA          NA
16:  4 Friend4       NA          NA

变量id只是一个占位符,所以我可以融化DT。

答案 2 :(得分:2)

setkey(dict,Nickname)
dat[,paste(names(dat),"Name",sep="."):=lapply(.SD,function(x)dict[J(x)]$Name)]
setcolorder(dat,c(1,5,2,6,3,7,4,8))
dat
#    Friend1 Friend1.Name Friend2 Friend2.Name Friend3 Friend3.Name Friend4 Friend4.Name
# 1:    Abby      Abigail     Ben     Benjamin      Ed       Edward     Dan       Daniel
# 2:     Ben     Benjamin      Ed       Edward      NA           NA      NA           NA
# 3:     Ben     Benjamin      NA           NA      NA           NA      NA           NA
# 4:   Chris  Christopher      Ed       Edward     Dan       Daniel      NA           NA

答案 3 :(得分:1)

在基地,超级丑陋:

cbind(dat, lapply(dat, function(x){dict$Name[match(x, dict$Nickname)]}))

   Friend1 Friend2 Friend3 Friend4          V2       NA     NA     NA
1:    Abby     Ben      Ed     Dan     Abigail Benjamin Edward Daniel
2:     Ben      Ed      NA      NA    Benjamin   Edward     NA     NA
3:     Ben      NA      NA      NA    Benjamin       NA     NA     NA
4:   Chris      Ed     Dan      NA Christopher   Edward Daniel     NA