Question

我有一个数组

["killed", "thanks", "thank","+", "?", "]", "[", "(", ")", "*"]

如果它们包含在数组中，我想从文本中删除一些字符串。

通过这个例子，我想删除＆＃34; beauty＆＃34;，但不是＆＃34;精美＆＃34;。我怎么能这样做？

我尝试使用正则表达式：

text = "thanks john, Have you ever killed someone ?"
arry_words = ["killed", "thanks", "+", "?", "]", "[", "(", ")", "*"]
text.downcase.gsub(Regexp.new("\\b+(?:#{arry_words.join('|')})\\b"), '').strip

但是我收到了一些错误：

RegexpError: target of repeat operator is not specified
RegexpError: end pattern with unmatched parenthesis: ...

Answer 1

您拥有的搜索词包含特殊的正则表达式元字符，在使用正则表达式模式之前应该对其进行转义。这可以通过Regexp.union轻松实现，实际上加入与|和转义特殊字符。

返回一个Regexp对象，它是给定模式的并集，即匹配任何部分。模式可以是Regexp对象，在这种情况下，它们的选项将被保留，或者是字符串。

另一个问题是\b单词边界是依赖于上下文的，如果放在非单词字符之前，它需要在该字符之前使用单词char，如果放在非单词字符之后，则需要在那个非单词char之后的单词char。

使用

text.downcase.gsub(/(?<!\w)(?:#{Regexp.union(arry_words)})(?!\w)/, '').strip

请参阅this Ruby demo。

(?<!\w)将确保搜索词前面没有单词char，(?!\w)将确保搜索词后面没有单词char。

注意 在这种情况下，所有标点符号都不会匹配，以防它们粘在单词字符上。 < / p>

如果您仍然需要使用\b作为单词边界，则需要首先分析搜索词数组，并在搜索单词以字符char开头/结尾的位置添加\b。像

这样的东西

arry_words = arry_words.map { |x| 
    if x =~ /\A\w.*\w\z/ 
        x = "\\b#{Regexp.escape(x)}\\b"
    elsif x =~ /\A\w/ 
        x = "\\b" + Regexp.escape(x)
    elsif x =~ /\w\z/ 
        x = Regexp.escape(x) + "\\b"
    else
        x = Regexp.escape(x)
    end
}
rx = Regexp.new(arry_words.join('|'))
text.downcase.gsub(rx, '').strip

请参阅this Ruby demo。

Answer 2

试试这个

regexp = Regexp.union(words)

或者带有单词边界

regexp = /\b(#{Regexp.union(words).source})\b/i

Regexp#union转义字符串中的特殊字符，从而避免您遇到的错误。

Answer 3

我会分两步完成。这样做不需要转义某些字符，并且包含字边界取决于它们的放置位置。

arry_words = ["kissed", "kill", "+", "?", "]", "[", "(", ")", "*"]

word_words = arry_words.select { |w| w =~ /\A[[:alpha:]]+\z/ }
  #=> ["kissed", "kill"]

r = /\b#{Regexp.union(word_words)}\b/i
  #=> /\b(?-mix:kissed|kill)\b/i

text = "Thanks John, have you (ever) kissed and killed so[me**one?"

text.delete((arry_words-word_words).join).gsub(r, '')
  #=> "Thanks John, have you ever  or killed someone"

请注意

text.delete((arry_words-word_words).join)
  #=> "Thanks John, have you ever kissed and killed someone"

用正则表达式删除单词

3 个答案: