R中列表的字符串匹配

时间:2016-04-18 12:47:59

标签: r text

我正在尝试使用grep在R中执行字符串匹配。我必须将df1 $ ColA与df2 $ ColA相匹配 我在下面给出了输入和输出:

输入:

DF1:

ColA
text1
text2
text3
text4
text5
text6
text7

DF2:

ColA
text1 text2 text12
text23 text22 text7

中间产出:

ColA                    ColB
text1 text2 text12     text1, text2
text23 text22 text7    text7

最终输出:

ColA                ColB
text1 text2 text12   text1
text1 text2 text12   text2
text23 text22 text7  text7

方法:

我目前正在使用

test$test <- sapply(df2$ColA, function(x) ifelse(grep(paste(as.character(unlist(df1$ColA)),collapse="|"),x),1,0))

如果df1 $ ColA字符串与df2 $ ColA匹配但是不会返回匹配的字符串,它会给我。请指教。

2 个答案:

答案 0 :(得分:0)

这可能会对您有所帮助:

df2 <- matrix(sample(LETTERS)[-1], nrow=5)
df2 <- apply(df2, 1, FUN=function(x) paste(x, collapse=' '))

data <- data.frame(a=LETTERS[1:5], b=df2) ; data

df2 <- sapply(1:nrow(data), function(x) strsplit(as.character(data$b[x]), ' '))

sapply(1:nrow(data), function(x) which(data$a[x] == df2[[x]]))

sapply(1:nrow(data), function(x) data$a[x] == df2[[x]])

答案 1 :(得分:0)

这是一个基于match()的半矢量化解决方案,该解决方案应该快速生成您正在寻找的内容。匹配df1$ColA中的项目的方法是将df2$ColA标记为df1$ColA并将df2$ColA与每个标记匹配。然后,它会构建整个(原始)df1$ColA元素的重复,并在输出中将ColB匹配添加为# set up the data, which the OP should have done df1 <- data.frame(ColA = paste0("text", 1:7), stringsAsFactors = FALSE) df2 <- data.frame(ColA = c("text1 text2 text12", "text23 text22 text7"), stringsAsFactors = FALSE) # create a matrix of matches of first to elements of second matmatrix <- sapply(strsplit(df2$ColA, " "), match, df1$ColA) # repeat original text in same length as potential match origdfColArep <- rep(df2$ColA, each = nrow(matmatrix)) # create the results dataset, first the matches of the second part result <- data.frame(ColA = origdfColArep[!is.na(as.vector(matmatrix))], stringsAsFactors = FALSE) # then add the matching first part result$ColB <- df1$ColA[na.omit(as.vector(matmatrix))] result ## ColA ColB ## 1 text1 text2 text12 text1 ## 2 text1 text2 text12 text2 ## 3 text23 text22 text7 text7

position: absolute;