Question

以下正则表达式搜索为某些字符串提供了错误的结果。

str_extract_all("This Dose was given to him in the U.S. on 16 June",regex("(\\b(Baseline)\\b|\\b(Table)\\b|\\b(U.S.)\\b|\\b(D.S.)\\b)",ignore_case = TRUE))

它输出Dose作为基于D.S的匹配模式，它不应该。但是，它与U.S.不匹配，\\b\\b是模式的一部分并出现在文本中。

我提供了function binaries(num1){ var str = num1.toString(2) return(console.log('The binary form of ' + num1 + ' is: ' + str)) } binaries(3 ) /* According to MDN, Number.prototype.toString() overrides Object.prototype.toString() with the useful distinction that you can pass in a single integer argument. This argument is an optional radix, numbers 2 to 36 allowed.So in the example above, we’re passing in 2 to get a string representation of the binary for the base 10 number 100, i.e. 1100100. */，以便搜索模式的确切块。

上述搜索中有什么不正确的内容吗？

Answer 1

你应该

1）逃避点，
2）重新组织正则表达式以确保它没有尾随\b，因为它需要点后面的单词char，在这些情况下使用(?!\w)否定前瞻更合适（或者，如果您只想在空格或字符串结尾之前匹配，请使用(?!\S)）。

使用

> x <- "This Dose was given to him in the U.S. on 16 June"
> pattern <- "\\b(?:Baseline|Table|U\\.S\\.|D\\.S\\.)(?!\\w)"
> str_extract_all(x, regex(pattern,ignore_case = TRUE))
[[1]]
[1] "U.S."

请参阅regex demo。

<强>详情

\b - 一个前导词边界（因为所有替代词都以字词char开头，使用\b是合适的，否则请考虑更改为(?<!\w)或(?<!\S) ，如果在当前位置的左侧有一个单词/非空白字符，则匹配失败的否定前瞻。）
(?:Baseline|Table|U\.S\.|D\.S\.) - 其中一个替代子字符串，Baseline，Table，U.S.或D.S。
(?!\w) - 如果当前位置右侧有一个单词char，则表示匹配失败的否定前瞻。

使用str_extract_all进行正则表达式

1 个答案: