我正在尝试使用grep在R中执行字符串匹配。我必须将df1 $ ColA与df2 $ ColA相匹配 我在下面给出了输入和输出:
ColA
text1
text2
text3
text4
text5
text6
text7
ColA
text1 text2 text12
text23 text22 text7
ColA ColB
text1 text2 text12 text1, text2
text23 text22 text7 text7
ColA ColB
text1 text2 text12 text1
text1 text2 text12 text2
text23 text22 text7 text7
我目前正在使用
test$test <- sapply(df2$ColA, function(x) ifelse(grep(paste(as.character(unlist(df1$ColA)),collapse="|"),x),1,0))
如果df1 $ ColA字符串与df2 $ ColA匹配但是不会返回匹配的字符串,它会给我。请指教。
答案 0 :(得分:0)
这可能会对您有所帮助:
df2 <- matrix(sample(LETTERS)[-1], nrow=5)
df2 <- apply(df2, 1, FUN=function(x) paste(x, collapse=' '))
data <- data.frame(a=LETTERS[1:5], b=df2) ; data
df2 <- sapply(1:nrow(data), function(x) strsplit(as.character(data$b[x]), ' '))
sapply(1:nrow(data), function(x) which(data$a[x] == df2[[x]]))
sapply(1:nrow(data), function(x) data$a[x] == df2[[x]])
答案 1 :(得分:0)
这是一个基于match()
的半矢量化解决方案,该解决方案应该快速生成您正在寻找的内容。匹配df1$ColA
中的项目的方法是将df2$ColA
标记为df1$ColA
并将df2$ColA
与每个标记匹配。然后,它会构建整个(原始)df1$ColA
元素的重复,并在输出中将ColB
匹配添加为# set up the data, which the OP should have done
df1 <- data.frame(ColA = paste0("text", 1:7),
stringsAsFactors = FALSE)
df2 <- data.frame(ColA = c("text1 text2 text12",
"text23 text22 text7"),
stringsAsFactors = FALSE)
# create a matrix of matches of first to elements of second
matmatrix <- sapply(strsplit(df2$ColA, " "), match, df1$ColA)
# repeat original text in same length as potential match
origdfColArep <- rep(df2$ColA, each = nrow(matmatrix))
# create the results dataset, first the matches of the second part
result <- data.frame(ColA = origdfColArep[!is.na(as.vector(matmatrix))],
stringsAsFactors = FALSE)
# then add the matching first part
result$ColB <- df1$ColA[na.omit(as.vector(matmatrix))]
result
## ColA ColB
## 1 text1 text2 text12 text1
## 2 text1 text2 text12 text2
## 3 text23 text22 text7 text7
。
position: absolute;