用类似名称合并两个数据?

时间:2017-06-08 21:29:41

标签: r merge

是否可以按名称相似的列合并两个数据集?

country <- c("United States of America", "China", "Russia Federation")
scores <- c(1, 2, 3)
df.1 <- cbind(country, scores)

country <- c("United States", "China", "Russians")
scores <- c(3, 2, 1)
df.2 <- cbind(country, scores)

unsucessful.merge <- merge(df.1, df.2, by=c("country"))
unsucessful.merge
>   country scores.x scores.y
> 1   China        2        2

正如你所看到的,在合并之后,美国和俄罗斯都被淘汰了,我们留在了中国。我希望数据框看起来像这样:

successful.merge
>                    country scores.x scores.y
> 1                    China        2        2
> 2        Russia Federation        3        1
> 3 United States of America        1        3

1 个答案:

答案 0 :(得分:0)

如果您已经知道所有国家/地区名称排列,则可以使用正则表达式。

df.1 <- apply(df.1,2,function(x) gsub(".*United States.*|USA","United States",x,ignore.case=T))
df.1 <- apply(df.1,2,function(x) gsub(".*Russia.*","Russia",x,ignore.case=T))
df.2 <- apply(df.2,2,function(x) gsub(".*United States.*|USA","United States",x,ignore.case=T))
df.2 <- apply(df.2,2,function(x) gsub(".*Russia.*","Russia",x,ignore.case=T))
merge(df.1, df.2, by=c("country"))
        country scores.x scores.y
1         China        2        2
2        Russia        3        1
3 United States        1        3