使用R中的部分字符串匹配合并两个数据集

时间:2015-03-26 20:21:01

标签: r

我有两个数据集,如下df1df2。我希望通过匹配dfdf1$pkg将两者合并为一个df2$name。但df1$pkgdf2$name中的字符串并不完全相同。我尝试使用agrep,但这不起作用。将不胜感激任何帮助。

x<-agrep(df1[,2], df2[,1],ignore.case=T, value=T)
Warning message:
In agrep(df1[, 2], df2[, 1], ignore.case = T, value = T) :
  argument 'pattern' has length > 1 and only the first element will be used
> x
character(0)



df1<<-data.frame(apname=c("photo eff pro", "olx", "firefox", "word search", "chrome","bbc news"), 
             pkg=c("bbc.mobile.news", "com.dhqsolutions", "#com.olx.olx","org.mozilla.firefox","ws.letras", "com.chrome"),
             apcat=c(5,3,4,5,4,1))
df2<-data.frame(name=c("bbc.mob.news.ww", "com.dhqsolutions.enjoyphoto", "com.olx.olx","org.mozilla.firefox","ws.letras","chrome.approximated"),
            tic=c(10000, 12345, 123456, 23456,9903, 12389034))

1 个答案:

答案 0 :(得分:0)

您将要使用apply系列函数(see this excellent Q&A)或某种类型的循环。这就是您收到警告信息的原因。考虑这个简单的例子:

vec1 <- c("Dog", "Cat", "Pony")
vec2 <- c("catty", "doggy", "fish", "ponyt")
sapply(vec1, agrep, vec2)
# Dog  Cat  Pony 
#   2    1     4 

在您的情况下,您可能希望执行以下操作:sapply(df1$pkg, agrep, df2$name)