如何从字符串中删除某些字符?

时间:2014-03-17 00:21:30

标签: ruby regex arrays hash

如果我有一个字符串:

text = 1st RULE: You do not talk about FIGHT CLUB.
2nd RULE: You DO NOT talk about FIGHT CLUB.
3rd RULE: If someone says 'stop' or goes limp, taps out the fight is over.
4th RULE: Only two guys to a fight.
5th RULE: One fight at a time.
6th RULE: No shirts, no shoes.
7th RULE: Fights will go on as long as they have to.
8th RULE: If this is your first night at FIGHT CLUB, you HAVE to fight.

如何从中删除一些字符?我想要做的是删除与/[0-9:;!\?(){}%$#@*-,.<>"'+=]/匹配的字符。

我想计算文字中单词的频率,但是这些讨厌的小问题给了我一些问题。我认为最好的解决方案是使用:

text.map! do |word|
  if word =~ /[:;!\?(){}%$#@*-,.<>"'+=]/
     #do something that removes the character....
  end
end

我似乎找不到合适的解决方案。我查看了文档,并且我在delete_ifdelete_atdrop_while尝试过摇摆,但它似乎删除了整个元素单词,这就是我不想要。我尝试使用gsub这样的字符串方法,但是它没有按照我认为应该的方式工作。

有人可以指引我走正轨吗?我不想删除整个元素,只删除那些匹配的实例。

当我考虑它时,gsub可以工作,但我会用空格替换它吗?这会引起问题,如果它在字符串的中间,我试图替换。

我打算将它们存放在哈希。

2 个答案:

答案 0 :(得分:3)

[将我的评论转换为答案]

使用:

text.gsub(/[0-9:;!\?(){}%$#@*,.<>"'+=-]/, '')

这将删除字符。请注意,我将-移到了最后,以防止它被解释为范围。

您可以通过使用否定集(指定要保留的那些)来简化正则表达式:

text.gsub(/[^a-z ]/i, '')

答案 1 :(得分:1)

以下是模式的一些有趣的基准测试结果:

require 'fruity'

text = "1st RULE: You do not talk about FIGHT CLUB.
2nd RULE: You DO NOT talk about FIGHT CLUB.
3rd RULE: If someone says 'stop' or goes limp, taps out the fight is over.
4th RULE: Only two guys to a fight.
5th RULE: One fight at a time.
6th RULE: No shirts, no shoes.
7th RULE: Fights will go on as long as they have to.
8th RULE: If this is your first night at FIGHT CLUB, you HAVE to fight."

INCLUSIVE_PATTERN_STRING = %/[0-9:;!?(){}%$#@*,.<>"'+=-]/
EXCLUSIVE_PATTERN_STRING = %/[^a-z\n ]/

text.gsub(/#{ INCLUSIVE_PATTERN_STRING }/i, '')  # => "st RULE You do not talk about FIGHT CLUB\nnd RULE You DO NOT talk about FIGHT CLUB\nrd RULE If someone says stop or goes limp taps out the fight is over\nth RULE Only two guys to a fight\nth RULE One fight at a time\nth RULE No shirts no shoes\nth RULE Fights will go on as long as they have to\nth RULE If this is your first night at FIGHT CLUB you HAVE to fight"
text.gsub(/#{ EXCLUSIVE_PATTERN_STRING }/i, '')  # => "st RULE You do not talk about FIGHT CLUB\nnd RULE You DO NOT talk about FIGHT CLUB\nrd RULE If someone says stop or goes limp taps out the fight is over\nth RULE Only two guys to a fight\nth RULE One fight at a time\nth RULE No shirts no shoes\nth RULE Fights will go on as long as they have to\nth RULE If this is your first night at FIGHT CLUB you HAVE to fight"
text.gsub(/#{ INCLUSIVE_PATTERN_STRING }+/i, '') # => "st RULE You do not talk about FIGHT CLUB\nnd RULE You DO NOT talk about FIGHT CLUB\nrd RULE If someone says stop or goes limp taps out the fight is over\nth RULE Only two guys to a fight\nth RULE One fight at a time\nth RULE No shirts no shoes\nth RULE Fights will go on as long as they have to\nth RULE If this is your first night at FIGHT CLUB you HAVE to fight"
text.gsub(/#{ EXCLUSIVE_PATTERN_STRING }+/i, '') # => "st RULE You do not talk about FIGHT CLUB\nnd RULE You DO NOT talk about FIGHT CLUB\nrd RULE If someone says stop or goes limp taps out the fight is over\nth RULE Only two guys to a fight\nth RULE One fight at a time\nth RULE No shirts no shoes\nth RULE Fights will go on as long as they have to\nth RULE If this is your first night at FIGHT CLUB you HAVE to fight"

compare do
  inclusive        { text.gsub(/#{ INCLUSIVE_PATTERN_STRING }/i, '')  }
  exclusive        { text.gsub(/#{ EXCLUSIVE_PATTERN_STRING }/i, '')  }
  greedy_inclusive { text.gsub(/#{ INCLUSIVE_PATTERN_STRING }+/i, '') }
  greedy_exclusive { text.gsub(/#{ EXCLUSIVE_PATTERN_STRING }+/i, '') }
end

运行结果:

Running each test 128 times. Test will take about 1 second.
inclusive is faster than exclusive by 30.000000000000004% ± 1.0%
exclusive is faster than greedy_exclusive by 10.000000000000009% ± 1.0%
greedy_exclusive is faster than greedy_inclusive by 10.000000000000009% ± 1.0%