给出一个句子,我想要计算所有重复的单词:
这是来自Exercism.io Word count
例如输入"olly olly in come free"
plain
olly: 2
in: 1
come: 1
free: 1
我有这个测试例如:
def test_with_quotations
phrase = Phrase.new("Joe can't tell between 'large' and large.")
counts = {"joe"=>1, "can't"=>1, "tell"=>1, "between"=>1, "large"=>2, "and"=>1}
assert_equal counts, phrase.word_count
end
这是我的方法
def word_count
phrase = @phrase.downcase.split(/\W+/)
counts = phrase.group_by{|word| word}.map {|k,v| [k, v.count]}
Hash[*counts.flatten]
end
对于上面的测试,我在终端中运行时遇到了这个故障:
2) Failure:
PhraseTest#test_with_apostrophes [word_count_test.rb:69]:
--- expected
+++ actual
@@ -1 +1 @@
-{"first"=>1, "don't"=>2, "laugh"=>1, "then"=>1, "cry"=>1}
+{"first"=>1, "don"=>2, "t"=>2, "laugh"=>1, "then"=>1, "cry"=>1}
我的问题是删除除'
撇号之外的所有字符......
方法中的正则表达式几乎可以正常工作......
phrase = @phrase.downcase.split(/\W+/)
但它删除了撇号...
我不想保留单引号,'Hello'
=> Hello
但是Don't be cruel
=> Don't
be
cruel
答案 0 :(得分:4)
可能是这样的:
string.scan(/\b[\w']+\b/i).each_with_object(Hash.new(0)){|a,(k,v)| k[a]+=1}
正则表达式使用单词边界(\ b)。 扫描输出找到的单词数组,对于数组中的每个单词,它们被添加到散列中,每个项目的默认值为零,然后递增。
在找到所有项目并忽略大小写时,原来我的解决方案仍会将项目保留在最初找到的情况下。 现在这将是Nelly决定接受原样或者在原始字符串或数组项上执行小写,因为它被添加到散列中。
我会把这个决定留给你:)
答案 1 :(得分:1)
假设:
irb(main):015:0> phrase
=> "First: don't laugh. Then: don't cry."
尝试:
irb(main):011:0> Hash[phrase.downcase.scan(/[a-z']+/)
.group_by{|word| word.downcase}
.map{|word, words|[word, words.size]}
]
=> {"first"=>1, "don't"=>2, "laugh"=>1, "then"=>1, "cry"=>1}
根据您的更新,如果您要删除单引号,请先执行此操作:
irb(main):038:0> p2
=> "Joe can't tell between 'large' and large."
irb(main):039:0> p2.gsub(/(?<!\w)'|'(?!\w)/,'')
=> "Joe can't tell between large and large."
然后使用相同的方法。
但是你说 - gsub(/(?<!\w)'|'(?!\w)/,'')
将删除'Twas the night before.
中的撇号我回答你最终需要构建一个解析器,它可以确定撇号和单引号之间的区别{{1仅仅是不够的。
您还可以使用字边界:
/(?<!\w)'|'(?!\w)/
但这也不能解决irb(main):041:0> Hash[p2.downcase.scan(/\b[a-z']+\b/)
.group_by{|word| word.downcase}
.map{|word, words|[word, words.size]}
]
=> {"joe"=>1, "can't"=>1, "tell"=>1, "between"=>1, "large"=>2, "and"=>1}
。
答案 2 :(得分:0)
另一种方式:
str = "First: don't 'laugh'. Then: 'don't cry'."
reg = /
[a-z] #single letter
[a-z']+ #one or more letters or apostrophe
[a-z] #single letter
'? #optional single apostrophe
/ix #case-insensitive and free-spacing regex
str.scan(reg).group_by(&:itself).transform_values(&:count)
#=> {"First"=>1, "don't"=>2, "laugh"=>1, "Then"=>1, "cry'"=>1}