如果一个数据帧的元素与另一个数据帧匹配,同时保留那些不匹配的数据帧,如何替换它们?

时间:2015-02-10 16:55:13

标签: r dataframe match

我正在尝试替换数据框的某些元素,如果它们与另一个数据框的元素匹配。

DF1:

      V1        V2    V3
10 JP_00267-008 JP_00267-008 Line
11 JP_00302-049 JP_00302-049 Line
12      4FP3188      4FP3188 Line
13 JP_00284-029 JP_00284-029 Line
14 JP_00268-005 JP_00268-005 Line
15 JP_00265-057 JP_00265-057 Line
16 JP_00286-010 JP_00286-010 Line
17 JP_00283-008 JP_00283-008 Line
18 JP_00330-298 JP_00330-298 Line
19 JP_00269-035 JP_00269-035 Line
20 JP_00300-106 JP_00300-106 Line

DF2:

      V1   V2
10 JP_00267-008 4FP3428 
11 JP_00302-049 4FP5103 
13 JP_00284-029 4FP4137 
14 JP_00268-005 4FP3465 
15 JP_00265-057 4FP3367 
16 JP_00286-010 4FP4245 
17 JP_00283-008 4FP4085 
18 JP_00330-298 4PP3992 
19 JP_00269-035 4FP3575 
20 JP_00300-106 4FP4963

我想要的输出是:

      V1    V2  V3
10  4FP3428 JP_00267-008 Line
11  4FP5103 JP_00302-049 Line
12  4FP3188      4FP3188 Line
13  4FP4137 JP_00284-029 Line
14  4FP3465 JP_00268-005 Line
15  4FP3367 JP_00265-057 Line
16  4FP4245 JP_00286-010 Line
17  4FP4085 JP_00283-008 Line
18  4PP3992 JP_00330-298 Line
19  4FP3575 JP_00269-035 Line
20  4FP4963 JP_00300-106 Line

但我得到的是:

      V1       V2         V3
10  4FP3428 JP_00267-008 Line
11  4FP5103 JP_00302-049 Line
12     <NA>      4FP3188 Line
13  4FP4137 JP_00284-029 Line
14  4FP3465 JP_00268-005 Line
15  4FP3367 JP_00265-057 Line
16  4FP4245 JP_00286-010 Line
17  4FP4085 JP_00283-008 Line
18  4PP3992 JP_00330-298 Line
19  4FP3575 JP_00269-035 Line
20  4FP4963 JP_00300-106 Line

这是我使用的代码:

df1[,1] <- df2[match(as.character(unlist(df1[,1])), as.character(df2[[1]])), 2]

任何人都可以帮助我没有NA并且拥有原始元素吗?

提前致谢

2 个答案:

答案 0 :(得分:3)

如果你想坚持使用基础R,请使用

# an index which includes missing values
idx <- match(as.character(unlist(df1[,1])), as.character(df2[[1]]))

# an index of the non-missing values in `idx`
idx_not_missing <- !is.na(idx)

# push the data only when the index `idx` is not missing 
df1[idx_not_missing,1] <- df2[idx[idx_not_missing], 2]

答案 1 :(得分:1)

以下是使用data.table

的选项
 library(data.table)
 setkey(setDT(df1), V1)[df2, V1:=i.V2][]
 #       V1           V2   V3
 # 1: 4FP3188      4FP3188 Line
 #2: 4FP3367 JP_00265-057 Line
 #3: 4FP3428 JP_00267-008 Line
 #4: 4FP3465 JP_00268-005 Line
 #5: 4FP3575 JP_00269-035 Line
 #6: 4FP4085 JP_00283-008 Line
 #7: 4FP4137 JP_00284-029 Line
 #8: 4FP4245 JP_00286-010 Line
 #9: 4FP4963 JP_00300-106 Line
#10: 4FP5103 JP_00302-049 Line
#11: 4PP3992 JP_00330-298 Line

或使用dplyr

 library(dplyr)
 left_join(df1, df2, by='V1') %>% 
           mutate(V2.y= ifelse(is.na(V2.y), V1, V2.y)) %>%
           select(-V1) %>% 
           rename(V1=V2.y, V2=V2.x)
 #            V2   V3      V1
 #1  JP_00267-008 Line 4FP3428
 #2  JP_00302-049 Line 4FP5103
 #3       4FP3188 Line 4FP3188
 #4  JP_00284-029 Line 4FP4137
 #5  JP_00268-005 Line 4FP3465
 #6  JP_00265-057 Line 4FP3367
 #7  JP_00286-010 Line 4FP4245
 #8  JP_00283-008 Line 4FP4085
 #9  JP_00330-298 Line 4PP3992
 #10 JP_00269-035 Line 4FP3575
 #11 JP_00300-106 Line 4FP4963