我想在数组上做多个正则表达式替换,我有这个工作代码,但它似乎不是红宝石的方式,谁有更好的解决方案?
#files contains the string that need cleaning
files = [
"Beatles - The Word ",
"The Beatles - The Word",
"Beatles - Tell Me Why",
"Beatles - Tell Me Why (remastered)",
"Beatles - Love me do"
]
#ignore contains the reg expr that need to bee checked
ignore = [/the/,/\(.*\)/,/remastered/,/live/,/remix/,/mix/,/acoustic/,/version/,/ +/]
files.each do |file|
ignore.each do |e|
file.downcase!
file.gsub!(e," ")
file.strip!
end
end
p files
#=>["beatles - word", "beatles - word", "beatles - tell me why", "beatles - tell me why", "beatles - love me do"]
答案 0 :(得分:3)
ignore = ["the", "(", ".", "*", ")", "remastered", "live", "remix", "mix", "acoustic", "version", "+"]
re = Regexp.union(ignore)
p re #=> /the|\(|\.|\*|\)|remastered|live|remix|mix|acoustic|version|\+/
Regexp.union
负责逃避。
答案 1 :(得分:1)
您可以将大部分内容置于单个正则表达式替换操作中。此外,您应该使用单词边界锚(\b
),或者例如the
也匹配There's a Place
。
file.gsub!(/(?:\b(?:the|remastered|live|remix|mix|acoustic|version)\b)|\([^()]*\)/, ' ')
应该照顾好这个。
然后,您可以在第二步中删除多个空格:
file.gsub!(/ +/, ' ')
如果要将正则表达式保留在数组中,则需要遍历数组并为每个正则表达式执行替换。但是你至少可以从循环中取出一些命令:
files.each do |file|
file.downcase!
ignore.each do |e|
file.gsub!(e," ")
end
file.strip!
end
当然,您需要在忽略列表中的每个单词周围添加单词边界:
ignore = [/\bthe\b/, /\([^()]*\)/, /\bremastered\b/, ...]
答案 2 :(得分:0)
我从你的答案中得到了这个解决方案,2个版本,一个转换为字符串(不改变文件数组,另一个扩展了Array,它确实改变了文件数组本身。类approuch快了2倍。如果onyone仍然有建议,请分享。
files = [
"Beatles - The Word ",
"The Beatles - The Word",
"Beatles - Tell Me Why",
"The Beatles - Tell Me Why (remastered)",
"Beatles - wordwiththein wordwithlivein"
]
ignore = /\(.*\)|[_]|\b(the|remastered|live|remix|mix|acoustic|version)\b/
class Array
def cleanup ignore
self.each do |e|
e.downcase!
e.gsub!(ignore," ")
e.gsub!(/ +/," ")
e.strip!
end
end
end
p files.join("#").downcase!.gsub(ignore," ").gsub(/ +/," ").split(/ *# */)
#=>["beatles - word", "beatles - word", "beatles - tell me why", "beatles - tell me why", "beatles - wordwiththein wordwithlivein"]
Benchmark.bm do |x|
x.report("string method") { 10000.times { files.join("#").downcase!.gsub(ignore," ").gsub(/ +/," ").split(/ *# */) } }
x.report("class method") { 10000.times { files.cleanup ignore } }
end
=begin
user system total real
string method 0.328000 0.000000 0.328000 ( 0.327600)
class method 0.187000 0.000000 0.187000 ( 0.187200)
=end