是否可以按名称相似的列合并两个数据集?
country <- c("United States of America", "China", "Russia Federation")
scores <- c(1, 2, 3)
df.1 <- cbind(country, scores)
country <- c("United States", "China", "Russians")
scores <- c(3, 2, 1)
df.2 <- cbind(country, scores)
unsucessful.merge <- merge(df.1, df.2, by=c("country"))
unsucessful.merge
> country scores.x scores.y
> 1 China 2 2
正如你所看到的,在合并之后,美国和俄罗斯都被淘汰了,我们留在了中国。我希望数据框看起来像这样:
successful.merge
> country scores.x scores.y
> 1 China 2 2
> 2 Russia Federation 3 1
> 3 United States of America 1 3
答案 0 :(得分:0)
如果您已经知道所有国家/地区名称排列,则可以使用正则表达式。
df.1 <- apply(df.1,2,function(x) gsub(".*United States.*|USA","United States",x,ignore.case=T))
df.1 <- apply(df.1,2,function(x) gsub(".*Russia.*","Russia",x,ignore.case=T))
df.2 <- apply(df.2,2,function(x) gsub(".*United States.*|USA","United States",x,ignore.case=T))
df.2 <- apply(df.2,2,function(x) gsub(".*Russia.*","Russia",x,ignore.case=T))
merge(df.1, df.2, by=c("country"))
country scores.x scores.y
1 China 2 2
2 Russia 3 1
3 United States 1 3