寻求帮助,了解如何匹配R
中的字符串,例如PRXMATCH()
SAS
中的字符串。
List1 <-c("lead","good")
List2 <-c("Quality","understand")
Name <-c("grp1","grp2")
我有一个包含列sentence
的数据框。对于我需要的每一句话:
List1
List2
的相应单词。List1
中的字相距+ -5个字的单词,则应将Name
中的名称添加到result
列。例如,在所有句子中搜索"lead"
。找到"lead"
后,如果在该句子中找到"Quality"
,如果在+ -5字距离处找到"grp1"
,则应在result
列中添加test.cs
,否则将其丢弃。< / p>
答案 0 :(得分:0)
这样的事,也许?
myData <- data.frame(sentence = c("The quality bla bla bla lead bla",
"The quality bla bla bla bla bla lead bla",
"The lead quality bla bla",
"The lead bla bla quality",
"The lead bla bla bla bla bla quality of",
"It allows us to understand how good bla",
"It is good to understand that bla",
"It is also good bla bla bla if we understand",
"lead quality is good to understand"),
Result = "",
stringsAsFactors = FALSE)
List1 <-c("lead","good")
List2 <-c("quality","understand")
Name <-c("grp1","grp2")
regexpr <- paste0("(\\b",List1,"\\s+(\\w+\\s+){0,4}",List2,"\\b)|(\\b",List2,"\\s+(\\w+\\s+){0,4}",List1,"\\b)")
for(i in 1:length(regexpr)) {
myData$Result <- ifelse(grepl(pattern = regexpr[i], x = myData$sentence),
yes = paste(myData$Result, Name[i]),
no = myData$Result)
}
> myData
sentence Result
1 The quality bla bla bla lead bla grp1
2 The quality bla bla bla bla bla lead bla
3 The lead quality bla bla grp1
4 The lead bla bla quality grp1
5 The lead bla bla bla bla bla quality of
6 It allows us to understand how good bla grp2
7 It is good to understand that bla grp2
8 It is also good bla bla bla if we understand
9 lead quality is good to understand grp1 grp2