Question

我正在尝试匹配字符串中州的县名。

strings <- c("High School Graduate or Higher (5-year estimate) in Jefferson Parish, LA"
             ,"High School Graduate or Higher (5-year estimate) in Jefferson Davis Parish, LA")

countyName <- "Jefferson"
stateAbb <- "LA"

test <- gregexpr(paste0(countyName," (\\w), ",stateAbb,"$"),strings,ignore.case=T,perl=T)

我无法让test实际返回任何内容。

如果我将\\w替换为.*，然后＆＃34; Jefferson＆＃34;也将与＃34; Jefferson Davis＆＃34;。

匹配

当然，当县名实际上是＃34; Jefferson Davis＆＃34;时，我想匹配＆＃34; Jefferson Davis＆＃34;

Answer 1

您当前的正则表达式仅匹配 countyName 后面的单个“word”字符（即字母，数字或_符号）。要使其与1个或多个“字”字符匹配，请将+量词添加到\w：

test <- gregexpr(paste0(countyName," (\\w+), ",stateAbb,"$"),strings,ignore.case=T,perl=T)
                                         ^

生成的正则表达式看起来像

Jefferson (\w+), LA$

请参阅regex demo

<强>详情：

Jefferson - 文字子字符串
- 空格
(\w+) - 一个捕获组（可能，您甚至不需要它，删除(和)，如果您不需要访问此子匹配）匹配一个或多个字母，数字或_符号
, - 逗号，然后是sapce
LA - 文字子字符串
$ - 字符串结束。

R regexpr单词捕获

1 个答案: