Ruby Text Parsing:找到下一句的开头

时间:2012-09-18 07:22:49

标签: ruby-on-rails ruby text string-parsing

我的数据库中有一堆字符串,如下所示:

  

下班回家。那只狗从沙发上跳了起来   主人在门口。他舔脸干净。

字符串从句子中间开始。我想找个方法来切断最初的不完整的句子,然后从“从沙发上跳到门口的大师那里回来。他舔脸干净。”

我该怎么做?

4 个答案:

答案 0 :(得分:1)

问题是如何定义不完整的句子。我们可以假设所有以upcased字符开头的句子都是完整的句子。如果这样代码可能看起来像这样

str = 'driving home from work. The dog leaped of the sofa to great his master at the door. He licked his face clean.'
sentences = str.split('.')
sentences.shift if sentences[0][0].downcase == sentences[0][0]
sentences.join('.').strip << '.'

有点棘手但有效。

答案 1 :(得分:1)

最简单的答案:

str = 'driving home from work. The dog leaped of the sofa to great his master at the door. He licked his face clean.'
str.sub!(/^[^A-Z].+?\./, '').strip!

答案 2 :(得分:0)

https://github.com/ged/linkparser

这可能有所帮助。

答案 3 :(得分:0)

这样的事可能吗?

str = "driving home from work. The dog leaped of the sofa to great his master at the door. He licked his face clean."
str.first == str.first.upcase ? str : str.split(".")[1..-1].join(".").lstrip << "."

假设它以大写字母开头表示句子的开头,否则就不可能。其他需要考虑的情况,如果以数字开头怎么办?例如:1只狗逃跑了。狗......是一只狗......一句话?