计算句子数量Ruby

时间:2015-09-08 07:13:47

标签: ruby count sentence

我碰巧在各处搜索,并没有设法找到一个解决方案来计算使用Ruby的字符串中的句子数。有人怎么办?

实施例

string = "The best things in an artist’s work are so much a matter of intuition, that there is much to be said for the point of view that would altogether discourage intellectual inquiry into artistic phenomena on the part of the artist. Intuitions are shy things and apt to disappear if looked into too closely. And there is undoubtedly a danger that too much knowledge and training may supplant the natural intuitive feeling of a student, leaving only a cold knowledge of the means of expression in its place. For the artist, if he has the right stuff in him ... "

此字符串应返回数字4

4 个答案:

答案 0 :(得分:4)

您可以将文本拆分为句子并对其进行计数。这里:

string.scan(/[^\.!?]+[\.!?]/).map(&:strip).count # scan has regex to split string and strip will remove trailing spaces.
# => 4 

解释正则表达式:

[^\.!?]

字符类[^ ]内的插入符号是否定运算符。这意味着我们正在寻找列表中不存在的字符:.!?

+

是一个贪婪的运算符,返回1和无限次之间的匹配。 (在这里捕捉我们的句子并忽略重复,如...

[\.!?]  

匹配字符.!?

简而言之,我们捕获的所有字符都不是.!?,直到我们获得.!?。这基本上可以被视为句子(广义上)

答案 1 :(得分:3)

我认为考虑单词char后跟?!.句子的分隔符是有道理的:

string.strip.split(/\w[?!.]/).length
#=> 4

所以我不会考虑...分隔符,当它自己挂起时就是这样:

  • “我等了一会儿......然后我就回家了”

但话说回来,也许我应该......

对我而言,也许更好的分隔符是标点符号后跟一些空格和大写字母:

string.split(/[?!.]\s+[A-Z]/).length
#=> 4

答案 2 :(得分:1)

句子以句号,问号和惊叹号结束。他们也可以 用破折号和其他标点分隔,但我们不会担心这些罕见的情况。 分裂很简单。你可以简单地用Ruby来分割一种类型的文字 要求它拆分三种类型的字符中的任何一种,如下所示:

txt = "The best things in an artist’s work are so much a matter of intuition, that there is much to be said for the point of view that would altogether discourage intellectual inquiry into artistic phenomena on the part of the artist. Intuitions are shy things and apt to disappear if looked into too closely. And there is undoubtedly a danger that too much knowledge and training may supplant the natural intuitive feeling of a student, leaving only a cold knowledge of the means of expression in its place. For the artist, if he has the right stuff in him ... "

sentence_count = txt.split(/\.|\?|!/).length
puts sentence_count
#=> 7

答案 3 :(得分:0)

string.squeeze('.!?').count('.!?')
  #=> 4