Ruby将所有空格减少为单个空格

时间:2011-01-11 20:01:10

标签: ruby

我不知道这是怎么做的,因为我对正则表达式很陌生,似乎找不到合适的方法来完成这个但是说我有以下作为字符串(所有标签和换行符)包括在内)

1/2 cup  




            onion           
             (chopped)

如何删除所有空格并用一个空格替换每个实例?

6 个答案:

答案 0 :(得分:61)

这是正则表达式运行良好的情况,因为您希望将整个类的空白字符视为相同,并用空格字符替换任何空格组合的运行。因此,如果该字符串存储在s中,那么您可以这样做:

fixed_string = s.gsub(/\s+/, ' ')

答案 1 :(得分:18)

在Rails中,您可以使用String#squish,这是一个active_support扩展名。

require 'active_support'

s = <<-EOS
1/2 cup  

            onion
EOS

s.squish
# => 1/2 cup onion

答案 2 :(得分:9)

你想要挤压方法:

str.squeeze([other_str]*) → new_str
Builds a set of characters from the other_str parameter(s) using the procedure described for String#count. Returns a new string where runs of the same character that occur in this set are replaced by a single character. If no arguments are given, all runs of identical characters are replaced by a single character.

   "yellow moon".squeeze                  #=> "yelow mon"
   "  now   is  the".squeeze(" ")         #=> " now is the"
   "putters shoot balls".squeeze("m-z")   #=> "puters shot balls"

答案 3 :(得分:6)

最简单的解决方案gsub(/\s+/, ' ')的问题在于它非常缓慢,因为它取代了每个空间,即使它是单个空间。但通常在单词之间有1个空格,只有在顺序中有2个或更多的空格时才能修复。

更好的解决方案是gsub(/[\r\n\t]/, ' ').gsub(/ {2,}/, ' ') - 首先摆脱特殊的空格,然后挤压普通空间

def method1(s) s.gsub!(/\s+/, ' '); s end
def method2(s) s.gsub!(/[\r\n\t]/, ' '); s.gsub!(/ {2,}/, ' '); s end

Benchmark.bm do |x|
  n = 100_000
  x.report('method1') { n.times { method1("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('method2') { n.times { method2("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
end;1

#        user     system      total        real
# method1  4.090000   0.010000   4.100000 (  4.124844)
# method2  1.590000   0.010000   1.600000 (  1.611443)

答案 4 :(得分:4)

所选答案不会删除non-breaking space个字符。

这应该适用于1.9:

fixed_string = s.gsub(/(\s|\u00A0)+/, ' ')

答案 5 :(得分:0)

如果要考虑速度,那么最好的选择就是这个。

.tr("\r\n\t", ' ').gsub(/ {2,}/, ' ')

这用空格替换空格字符,然后用单个空格替换多个空格。

我看到了Lev发布的基准,并比较了gsub .sqeeze .tr和.squish的变体。我扩大了他的基准测试范围,虽然.squeeze是最快的,但它不能回答问题,因为它只会将多个制表符/新行压缩为单行制表符/新行。

# Replace multiple whitespace characters with a single space.
def method1(s) s.gsub!(/\s+/, ' '); s end # (in place)
def method2(s) s = s.gsub(/\s+/, ' '); s end

# Replace characters with a space then replace multiple spaces with a single space.
def method3(s) s.gsub!(/[\r\n\t]/, ' '); s.gsub!(/ {2,}/, ' '); s end # (in place)
def method4(s) s = s.gsub(/[\r\n\t]/, ' ').gsub(/ {2,}/, ' '); s end

# Replace characters with a space then replace multiple spaces with a single space.
def method5(s) s.tr!("\r\n\t", ' '); s.gsub!(/ {2,}/, ' '); s end # (in place)
def method6(s) s = s.tr("\r\n\t", ' ').gsub(/ {2,}/, ' '); s end

# Replace multiple whitespace characters with a single space.
def method7(s) s.squish!; s end # (in place)
def method8(s) s = s.squish; s end

# Combines multiple spaces into a single space
def method9(s) s.squeeze!(" "); s end # (in place)
def method10(s) s = s.squeeze(" "); s end

Benchmark.bm do |x|
  n = 100_000
  x.report('.gsub!      ') { n.times { method1("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.gsub       ') { n.times { method2("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.gsub!.gsub!') { n.times { method3("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.gsub .gsub ') { n.times { method4("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.tr!.gsub!  ') { n.times { method5("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.tr .gsub   ') { n.times { method6("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.squish     ') { n.times { method7("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.squish!    ') { n.times { method8("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.squeeze!   ') { n.times { method9("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
  x.report('.squeeze    ') { n.times { method10("Lorem   ipsum\n\n dolor \t\t\tsit amet, consectetur\n \n\t\n adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.") } }
end

获得这些结果

=>
#               user       system     total       real
# .gsub!        2.019544   0.030325   2.049869 (  2.059379)
# .gsub         1.968179   0.011204   1.979383 (  1.988050)
# .gsub!.gsub!  0.770042   0.014097   0.784139 (  0.787055)
# .gsub .gsub   0.728955   0.011577   0.740532 (  0.742887)
# .tr!.gsub!    0.487014   0.008260   0.495274 (  0.496820)
# .tr .gsub     0.487231   0.007769   0.495000 (  0.497164)
# .squish!      2.005224   0.011673   2.016897 (  2.025851)
# .squish       2.043497   0.013331   2.056828 (  2.066794)
# .squeeze!     0.117615   0.002004   0.119619 (  0.120140)
# .squeeze      0.196301   0.012094   0.208395 (  0.209267)