Question

我必须使用不同行数的数据帧，df_a和df_b。这是我的数据结构的一个例子。

DF_A

id   ident   var
1    Test1   
2    Test1   
3    Test2   
4    Test1
5    Test3

DF_B

id   ident   var
1    Test1   26
2    Test2   59

现在我想将df_b $ var复制到df_a $ var，但仅限于匹配的身份。

结果需要如下所示：

DF_A

id   ident   var
1    Test1   26
2    Test1   26
3    Test2   59
4    Test1   26
5    Test3   NA

我不太确定如何做到这一点 - 有人可以帮忙吗？

Answer 1

使用您的数据：

#I have removed the var column as, 1) it is blank in your case
#and 2) it will be filled in any way 
df_a <- read.table(header=T, text='id   ident 
1    Test1   
2    Test1   
3    Test2   
4    Test1
5    Test3')

df_b <- read.table(header=T, text='id   ident   var
1    Test1   26
2    Test2   59')

这是基础R：

#df_a stays as it is since you need all the columns
#from df_b we just need indent for the link and var to be added
#the by argument indicates the "link" column
#all.x=TRUE states that all columns from df_a will be kept in the result
merge(df_a, df_b[c('ident','var')], by='ident', all.x=TRUE)

  ident id var
1 Test1  1  26
2 Test1  2  26
3 Test1  4  26
4 Test2  3  59
5 Test3  5  NA

Answer 2

我们可以使用join中的data.table。我们将第一个'data.frame'转换为'data.table'（setDT(df_a)），使用on = 'ident'加入'df_b'。

library(data.table)#v1.9.6+
setDT(df_a)[df_b, var := i.var, on = 'ident'][]
#   id ident var
#1:  1 Test1  26
#2:  2 Test1  26
#3:  3 Test2  59
#4:  4 Test1  26
#5:  5 Test3  NA

注意：在上面的解决方案中，我删除了'df_a'的空'var'列。

编辑：基于@Aruns的评论。

或者我们可以使用match中的base R来获取数字索引并使用它来从'df_b'获取相应的'var'。即使我们在'df_a'中有一个空的“var”列，这个方法也会起作用。

df_a$var <- df_b$var[match(df_a$ident, df_b$ident)]

Answer 3

使用dplyr包，它很简单：

result = left_join(df_a, df_b, by = 'ident')

然而，这将复制一些冗余列。要清理它们，请使用：

result = select(result, id = id.x, ident, var = var.y)

如果满足条件，则将某个值添加到行中 - 在R中

3 个答案: