将dataframe列与另一个dataframe列进行比较

时间:2016-03-11 14:07:32

标签: r grep apply grepl

我有一个包含页面路径的数据框列(让我们称之为A):

pagePath
/text/other_text/123-string1-4571/text.html
/text/other_text/string2/15-some_other_txet.html
/text/other_text/25189-string3/45112-text.html
/text/other_text/text/string4/5418874-some_other_txet.html
/text/other_text/string5/text/some_other_txet-4157/text.html
/text/other_text/123-text-4571/text.html
/text/other_text/125-text-471/text.html

我还有另一个字符串数据框列,我们可以调用它(B)(两个数据帧不同,它们不具有相同的行数)。

以下是数据框B中我的专栏的一个示例:

names
string1
string11
string4
string3
string2
string10
string5
string100

我想要做的是检查我的页面路径(A)是否包含来自其他数据帧(B)的字符串。

我遇到了困难,因为我的两个数据帧长度不一样且数据没有组织。

预期输出

我希望得到这样的结果:

 pagePath                                                  names     exist
/text/other_text/123-string1-4571/text.html                string1   TRUE
/text/other_text/string2/15-some_other_txet.html           string2   TRUE
/text/other_text/25189-string3/45112-text.html             string3   TRUE
/text/other_text/text/string4/5418874-some_other_txet.html string4   TRUE
/text/string5/text/some_other_txet-4157/text.html          string5   TRUE
/text/other_text/123-text-4571/text.html                     NA      FALSE
/text/other_text/125-text-471/text.html                      NA      FALSE

如果我的问题需要进一步澄清,请提及此事。

4 个答案:

答案 0 :(得分:2)

我们可以使用exist

生成grepl()
# Collapse B$names into one string with "|" 
onestring <- paste(B$names, collapse = "|") 

# Generate new column
A$exist <- grepl(onestring, A$pagePath)

答案 1 :(得分:2)

不太好,因为包含for循环:

names <- rep(NA, length(A$pagePath))
exist <- rep(FALSE, length(A$pagePath))

for (name in B$names) {
  names[grep(name, A$pagePath)] <- name
  exist[grep(name, A$pagePath)] <- TRUE
}

答案 2 :(得分:2)

我们可以使用str_extract_all包中的stringrNA替换为character(0),因此我们必须更改

df$names <- as.character(str_extract_all(df$pagePath, "string[0-9]+"))
df$exist <- df$names %in% df1$names
df[df=="character(0)"] <- NA
df
#                                                 pagePath       names   exist
#1                  /text/other_text/123-string1-4571/text.html string1  TRUE
#2             /text/other_text/string2/15-some_other_txet.html string2  TRUE
#3               /text/other_text/25189-string3/45112-text.html string3  TRUE
#4   /text/other_text/text/string4/5418874-some_other_txet.html string4  TRUE
#5 /text/other_text/string5/text/some_other_txet-4157/text.html string5  TRUE
#6                     /text/other_text/123-text-4571/text.html    <NA> FALSE
#7                      /text/other_text/125-text-471/text.html    <NA> FALSE

数据

dput(df)
structure(list(pagePath = structure(c(1L, 5L, 4L, 7L, 6L, 2L, 
3L), .Label = c("/text/other_text/123-string1-4571/text.html", 
"/text/other_text/123-text-4571/text.html", "/text/other_text/125-text-471/text.html", 
"/text/other_text/25189-string3/45112-text.html", "/text/other_text/string2/15-some_other_txet.html", 
"/text/other_text/string5/text/some_other_txet-4157/text.html", 
"/text/other_text/text/string4/5418874-some_other_txet.html"), class = "factor")), .Names = "pagePath", class = "data.frame", row.names = c(NA, 
-7L))
dput(df1)
structure(list(names = structure(c(1L, 4L, 7L, 6L, 5L, 2L, 8L, 
3L), .Label = c("string1", "string10", "string100", "string11", 
"string2", "string3", "string4", "string5"), class = "factor")), .Names = "names", class = "data.frame", row.names = c(NA, 
-8L))

答案 3 :(得分:0)

以下是使用apply的一种方式:

df$exist <- apply( df,1,function(x){as.logical(grepl(x[2],x[1]))} )