Ruby Regexp.escape奇怪的输出

时间:2016-02-12 03:39:53

标签: ruby regex

irb(main):021:0> TEXT = <<EOF
irb(main):022:0" Text messaging, or texting, is the act of composing 
and sending electronic messages between two or more mobile phones, or fixed or 
portable devices over a phone network. The term originally referred to messages 
sent using the Short Message Service (SMS). 
It has grown to include multimedia messages(known as MMS) containing images, videos, 
and sound content, as well as ideograms known as emoji
irb(main):023:0" EOF
=> "Text messaging, or texting, is the act of composing and sending electronic 
messages between two or more mobile phones, or fixed or portable devices over a 
phone network. The term originally referred to messages sent using the Short 
Message Service (SMS). It has grown to include multimedia messages (known as MMS) 
containing images, videos, and sound content, as well as ideograms known as emoji\n"

irb(main):064:0> LEXICON = {
irb(main):065:1*   "Text" => "WRITING",
irb(main):066:1*   "is" => "WAS"
irb(main):067:1> }
irb(main):070:0> sanitized = TEXT.gsub(pattern, LEXICON)
=> "WRITING messaging, or texting, WAS the act of composing and sending electronic messages between two or more mobile phones, or fixed or portable devices over a phone network. The term originally referred to messages sent using the Short Message Service (SMS). It has grown to include multimedia messages (known as MMS) containing images, videos, and sound content, as well as ideograms known as emoji\n"
irb(main):071:0> terms = LEXICON.keys.map {|k| Regexp.new(Regexp.escape(k))}.join("|")
=> "(?-mix:Text)|(?-mix:is)"    <------------ What is this (?-mix:...) thing?

irb(main):072:0> sanitized = TEXT.gsub(pattern, LEXICON)
=> "WRITING messaging, or texting, WAS the act of composing and sending electronic 
messages between two or more mobile phones, or fixed or portable devices over a 
phone network. The term originally referred to messages sent using the Short 
Message Service (SMS). It has grown to include multimedia messages (known as MMS) 
containing images, videos, and sound content, 
as well as ideograms known as emoji\n"

我正在观看Ruby Tapas,第190集,我在IRB会话中尝试了它,对我来说有趣的是我会得到一个(?-mix:key)|(? - mix :key)在那个表达之后。有人可以向我解释那是什么意思吗?

由于

1 个答案:

答案 0 :(得分:0)

这是covered in the docs(?-mix:...)禁用括号内子表达式的mix选项。

P.S。一种更简单的方法:

LEXICON.keys.map {|k| Regexp.new(Regexp.escape(k)) }.join("|")

这是:

Regexp.union(LEXICON.keys)

假设您最终需要Regexp而不是字符串。请参阅Regexp.union