删除除撇号之外的所有特殊字符

时间:2017-04-25 15:41:00

标签: ruby regex

给出一个句子,我想要计算所有重复的单词: 这是来自Exercism.io Word count

的练习

例如输入"olly olly in come free"

plain olly: 2 in: 1 come: 1 free: 1

我有这个测试例如:

  def test_with_quotations
    phrase = Phrase.new("Joe can't tell between 'large' and large.")
    counts = {"joe"=>1, "can't"=>1, "tell"=>1, "between"=>1, "large"=>2, "and"=>1}
    assert_equal counts, phrase.word_count
  end

这是我的方法

def word_count
    phrase = @phrase.downcase.split(/\W+/)
    counts = phrase.group_by{|word| word}.map {|k,v| [k, v.count]}
    Hash[*counts.flatten]
  end

对于上面的测试,我在终端中运行时遇到了这个故障:

  2) Failure:
PhraseTest#test_with_apostrophes [word_count_test.rb:69]:
--- expected
+++ actual
@@ -1 +1 @@
-{"first"=>1, "don't"=>2, "laugh"=>1, "then"=>1, "cry"=>1}
+{"first"=>1, "don"=>2, "t"=>2, "laugh"=>1, "then"=>1, "cry"=>1}

我的问题是删除除'撇号之外的所有字符......

方法中的正则表达式几乎可以正常工作...... phrase = @phrase.downcase.split(/\W+/) 但它删除了撇号...

我不想保留单引号,'Hello' => Hello 但是Don't be cruel => Don't be cruel

3 个答案:

答案 0 :(得分:4)

可能是这样的:

string.scan(/\b[\w']+\b/i).each_with_object(Hash.new(0)){|a,(k,v)| k[a]+=1}

正则表达式使用单词边界(\ b)。 扫描输出找到的单词数组,对于数组中的每个单词,它们被添加到散列中,每个项目的默认值为零,然后递增。

在找到所有项目并忽略大小写时,原来我的解决方案仍会将项目保留在最初找到的情况下。 现在这将是Nelly决定接受原样或者在原始字符串或数组项上执行小写,因为它被添加到散列中。

我会把这个决定留给你:)

答案 1 :(得分:1)

假设:

irb(main):015:0> phrase
=> "First: don't laugh. Then: don't cry."

尝试:

irb(main):011:0> Hash[phrase.downcase.scan(/[a-z']+/)
                     .group_by{|word| word.downcase}
                     .map{|word, words|[word, words.size]}
                    ]
=> {"first"=>1, "don't"=>2, "laugh"=>1, "then"=>1, "cry"=>1}

根据您的更新,如果您要删除单引号,请先执行此操作:

irb(main):038:0> p2
=> "Joe can't tell between 'large' and large."
irb(main):039:0> p2.gsub(/(?<!\w)'|'(?!\w)/,'')
=> "Joe can't tell between large and large."

然后使用相同的方法。

但是你说 - gsub(/(?<!\w)'|'(?!\w)/,'')将删除'Twas the night before.中的撇号我回答你最终需要构建一个解析器,它可以确定撇号和单引号之间的区别{{1仅仅是不够的。

您还可以使用字边界:

/(?<!\w)'|'(?!\w)/

但这也不能解决irb(main):041:0> Hash[p2.downcase.scan(/\b[a-z']+\b/) .group_by{|word| word.downcase} .map{|word, words|[word, words.size]} ] => {"joe"=>1, "can't"=>1, "tell"=>1, "between"=>1, "large"=>2, "and"=>1}

答案 2 :(得分:0)

另一种方式:

str = "First: don't 'laugh'. Then: 'don't cry'."
reg = /
      [a-z]         #single letter
      [a-z']+       #one or more letters or apostrophe
      [a-z]         #single letter
      '?            #optional single apostrophe

      /ix           #case-insensitive and free-spacing regex

str.scan(reg).group_by(&:itself).transfor‌​m_values(&:count) 
  #=> {"First"=>1, "don't"=>2, "laugh"=>1, "Then"=>1, "cry'"=>1}