有没有一种方法可以附加来自2个不同数据帧的列的值

时间:2019-07-10 09:34:13

标签: r dplyr

我有两个数据帧
df1:

DAT1 DAT3     DAT4    ...
 1   this is  this is
 2   this is  this is
 3   this is  this is

df2:

DAT1 DAT3       DAT4      ... 
 1   a comment  a comment
 2   a comment  a comment
 3   a comment  a comment

我想找到一种方法将第二个数据框列(我知道我需要附加的列的名称和位置)附加到第一个,并获得具有以下内容的第一个数据框的更新版本:
df3:

DAT1 DAT3               DAT4               ... 
 1   this is a comment  this is a comment  
 2   this is a comment  this is a comment
 3   this is a comment  this is a comment

问题在于实际的数据帧具有许多行和列,因此for()循环实际上效率很低。

4 个答案:

答案 0 :(得分:2)

我们可以使用Map

cols <- c("DAT3", "DAT4")
df3 <- df1
df3[cols] <- Map(paste, df1[cols], df2[cols])

df3
#  DAT1              DAT3              DAT4
#1    1 this is a comment this is a comment
#2    2 this is a comment this is a comment
#3    3 this is a comment this is a comment

答案 1 :(得分:2)

我们可以使用base R而无需循环

cols <- c("DAT3", "DAT4")     
df3 <- df1
df3[cols] <-matrix(paste(as.matrix(df1[-1]), as.matrix(df2[-1])), nrow = nrow(df1))
df3
#  DAT1              DAT3              DAT4
#1    1 this is a comment this is a comment
#2    2 this is a comment this is a comment
#3    3 this is a comment this is a comment

数据

df1 <- structure(list(DAT1 = 1:3, DAT3 = c("this is", "this is", "this is"
), DAT4 = c("this is", "this is", "this is")), class = "data.frame",
row.names = c(NA, 
-3L))

df2 <- structure(list(DAT1 = 1:3, DAT3 = c("a comment", "a comment", 
"a comment"), DAT4 = c("a comment", "a comment", "a comment")),
   class = "data.frame", row.names = c(NA, 
-3L))

答案 2 :(得分:1)

如果订购了您的数据,我会做这样的事情:

#initiate the data.frame with the id
df3 <- data.frame(DAT1 = df1$DAT1)

#then run a for-loop with the names you know you need to concatenate
for (i in c('DAT3', 'DAT4')) {
  df3[[i]] <- paste(df1[[i]], df2[[i]])
}

for循环仅遍历名称。该代码的核心是paste,它是矢量化且快速的。因此,您不会遇到任何速度问题

df3
#  DAT1              DAT3              DAT4
#1    1 this-is a-comment this-is a-comment
#2    2 this-is a-comment this-is a-comment
#3    3 this-is a-comment this-is a-comment

答案 3 :(得分:0)

dplyr版本

df1 %>% inner_join(df2, by = "DAT1") %>% rowwise() %>%
  mutate(DAT3 = paste(DAT3.x, DAT3.y, collapse = " "),
         DAT4 = paste(DAT4.x, DAT4.y, collapse = " ")) %>%
  select(everything(), -contains("."))

OutPut

# A tibble: 3 x 3
   DAT1 DAT3              DAT4             
  <dbl> <chr>             <chr>            
1     1 this is a comment this is a comment
2     2 this is a comment this is a comment
3     3 this is a comment this is a comment