将合并中的后缀扩展到所有非列

时间:2013-10-02 16:32:12

标签: r merge data.table

suffixes中的

merge仅适用于常用列名。无论如何还要将其扩展到其余列,而无需在合并之前手动更新列吗?

那是 -

df1 <- data.table(
a = c(1,2,3,4,5,6),
b = c('a','b','f','e','r','h'),
d = c('q','l','o','n','q','z')
)

df2 <- data.table(
a = c(1,2,3,4,5,6),
d = c('q','l','o','n','q','z')
)

colnames(merge(df1,df2, by = 'a', suffixes = c("1","2")))
#[1] "a"  "b"  "d1" "d2" what it does
#[1] "a"  "b1" "d1" "d2" what I'd like it to do

我正在处理的这种方式类似于@ mrip的答案。

df1 <- data.table(
a = c(1,2,3,4,5,6),
b = c('a','b','f','e','r','h'),
r = c('a','b','f','e','r','h'),
d = c('q','l','o','n','q','z')
)

df2 <- data.table(
a = c(1,2,3,4,5,6),
c = c('a','b','f','e','r','h'),
q = c('a','b','f','e','r','h'),
d = c('q','l','o','n','q','z')
)

dfmerge <- (merge(df1,df2, by = c("a"), suffixes = c("1","2")))

setnames(
dfmerge,
setdiff(names(df1),names(df2)),
paste0(setdiff(names(df1),names(df2)),"1")
)

setnames(
dfmerge,
setdiff(names(df2),names(df1)),
paste0(setdiff(names(df2),names(df1)),"2")
)

colnames(dfmerge)
#[1] "a"  "b1" "r1" "d1" "c2" "q2" "d2"

3 个答案:

答案 0 :(得分:11)

一个简单的解决方案:

mrg<-(merge(df1,df2, by = 'a', suffixes = c("1","2")))
setnames(mrg,paste0(names(mrg),ifelse(names(mrg) %in% setdiff(names(df1),names(df2)),"1","")))
setnames(mrg,paste0(names(mrg),ifelse(names(mrg) %in% setdiff(names(df2),names(df1)),"2","")))

> names(mrg)
[1] "a"  "b1" "d1" "d2"

编辑:感谢里卡多·萨波塔(Ricardo Saporta)对大幅清理这个问题的评论,并教给我一些新的提示!

答案 1 :(得分:5)

尝试以下方法:

colnames(
  mergeWithSuffix(df1,df2, by = 'a', suffixes = c("1","2"))
)
[1] "a"   "b.1" "d.1" "d.2"

请注意原始data.frames未受损害。

colnames(df1)
[1] "a" "b" "d"

colnames(df2)
[1] "a" "d"

功能如下

require(data.table)

mergeWithSuffix <- function(x, y, by, suffixes=NULL, ...) {

  # Add Suffixes
  mkSuffix(x, suffixes[[1]], merge.col=by)
  mkSuffix(y, suffixes[[2]], merge.col=by)

  # Merge
  ret <- merge(x, y, by = by, suffixes = NULL, ...)

  # Remove Suffixes
  undoSuffix(x, suffixes[[1]], merge.col=by)
  undoSuffix(y, suffixes[[2]], merge.col=by)
  return(ret)
}

mkSuffix <- function(x, sfx, sep=".", merge.col=NULL)  {
  nms <- setdiff(names(x), merge.col)
  setnames(x, nms, paste(nms, sfx, sep=".") ) 
}

undoSuffix <- function(x, sfx, sep=".", merge.col=NULL) {
  nms <- setdiff(names(x), merge.col)
  setnames(x, nms, sub(paste0(get("sep"), sfx, "$"), "", nms))
}

请注意setnames通过引用工作,因此开销几乎可以忽略不计。此外,正如其他地方所讨论的,这在data.frames和data.table

上同样有效

答案 2 :(得分:1)

这是一个有趣的问题,我怀疑延长merge将是一个直截了当的解决方案,除非Matt Dowle和Co.认为这是值得在merge.data.table中实施的。

这是我想到的一种方法:

DTs <- c("df1", "df2")
suffixes <- seq_along(DTs)

for (i in seq_along(DTs)) {
  Name <- setdiff(colnames(get(DTs[i])), "a")
  setnames(get(DTs[i]), Name, paste(Name, suffixes[i], sep = "."))
}

merge(df1, df2, by = "a") # Will obviously work as you expect now