ruby自动化多个正则表达式替换

时间:2012-04-30 09:32:56

标签: ruby regex

我想在数组上做多个正则表达式替换,我有这个工作代码,但它似乎不是红宝石的方式,谁有更好的解决方案?

#files contains the string that need cleaning
files = [
   "Beatles - The Word ",
  "The Beatles - The Word",
  "Beatles - Tell Me Why",
  "Beatles - Tell Me Why (remastered)",
  "Beatles - Love me do"
]

#ignore contains the reg expr that need to bee checked
ignore = [/the/,/\(.*\)/,/remastered/,/live/,/remix/,/mix/,/acoustic/,/version/,/  +/]

files.each do |file|
  ignore.each do |e|
    file.downcase!
    file.gsub!(e," ")
    file.strip!
  end
end
p files
#=>["beatles - word", "beatles - word", "beatles - tell me why", "beatles - tell me why", "beatles - love me do"]

3 个答案:

答案 0 :(得分:3)

ignore = ["the", "(", ".",  "*", ")", "remastered", "live", "remix",  "mix", "acoustic", "version", "+"]
re = Regexp.union(ignore)
p re #=> /the|\(|\.|\*|\)|remastered|live|remix|mix|acoustic|version|\+/

Regexp.union负责逃避。

答案 1 :(得分:1)

您可以将大部分内容置于单个正则表达式替换操作中。此外,您应该使用单词边界锚(\b),或者例如the也匹配There's a Place

file.gsub!(/(?:\b(?:the|remastered|live|remix|mix|acoustic|version)\b)|\([^()]*\)/, ' ')

应该照顾好这个。

然后,您可以在第二步中删除多个空格:

file.gsub!(/  +/, ' ')

如果要将正则表达式保留在数组中,则需要遍历数组并为每个正则表达式执行替换。但是你至少可以从循环中取出一些命令:

files.each do |file|
  file.downcase!
  ignore.each do |e|
    file.gsub!(e," ")
  end
  file.strip!
end

当然,您需要在忽略列表中的每个单词周围添加单词边界:

ignore = [/\bthe\b/, /\([^()]*\)/, /\bremastered\b/, ...]

答案 2 :(得分:0)

我从你的答案中得到了这个解决方案,2个版本,一个转换为字符串(不改变文件数组,另一个扩展了Array,它确实改变了文件数组本身。类approuch快了2倍。如果onyone仍然有建议,请分享。

files = [
   "Beatles - The Word ",
  "The Beatles - The Word",
  "Beatles - Tell Me Why",
  "The Beatles - Tell Me Why (remastered)",
  "Beatles - wordwiththein wordwithlivein"
]

ignore = /\(.*\)|[_]|\b(the|remastered|live|remix|mix|acoustic|version)\b/

class Array
  def cleanup ignore
    self.each do |e|
      e.downcase!
      e.gsub!(ignore," ")
      e.gsub!(/  +/," ")
      e.strip!
    end
  end
end

p files.join("#").downcase!.gsub(ignore," ").gsub(/  +/," ").split(/ *# */)
#=>["beatles - word", "beatles - word", "beatles - tell me why", "beatles - tell me why", "beatles - wordwiththein wordwithlivein"]

Benchmark.bm do |x| 
  x.report("string method")  { 10000.times { files.join("#").downcase!.gsub(ignore," ").gsub(/  +/," ").split(/ *# */) } }
  x.report("class  method")   { 10000.times { files.cleanup ignore } }
end

=begin
       user     system      total        real
string method  0.328000   0.000000   0.328000 (  0.327600)
class  method  0.187000   0.000000   0.187000 (  0.187200)
=end