从降价段落末尾删除句点

时间:2010-07-20 02:53:56

标签: ruby regex

我有很多用markdown编写的帖子,我需要删除每个段落末尾的句号

markdown中段落的结尾由以下分隔:

  • 2个或更多\n s
  • 字符串的结尾

但是,有这些边缘情况

  1. 省略号
  2. Acroynms(例如,当它落在段落的末尾时,我不想放弃“Notorious B.I.G.”的最后一段时间)。我认为你可以通过说“不要删除最后一个时期,如果它之前是一个大写字母本身之前是另一个时期”来处理这个案例。
  3. 特殊情况:e.g.i.e.etc.
  4. 这是一个正则表达式匹配有违规期的帖子,但它不考虑上面的(2)和(3):

    /[^.]\.(\n{2,}|\z)/

2 个答案:

答案 0 :(得分:1)

(?<!\.[a-zA-Z]|etc|\.\.)\.(?=\n{2,}|\Z)
  • (?<!\.[a-zA-Z]|etc|\.\.) - 向后看以确保句点之前没有.Tetc..等序列(对于省略号)。
  • \.期间
  • (?=\n{2,}|\Z)预测要查找降价段落的结尾(两个换行符或字符串结尾)

测试:

s = """ths is a paragraph.

this ends with an ellipsis...

this ends with etc.

this ends with B.I.G.

this ends with e.g.

this should be replaced.

this is end of text."""
print s.gsub(/(?<!\.[a-zA-Z]|etc|\.\.)\.(?=[\n]{2,}|\Z)/, "") 
print "\n"

输出:

this is a paragraph

this ends with an ellipsis...

this ends with etc.

this ends with B.I.G.

this ends with e.g.

this should be replaced

this is end of text

答案 1 :(得分:0)

Ruby 1.8.7兼容算法:

s = %{this is a paragraph.

this ends with an ellipsis...

this ends with etc.

this ends with B.I.G.

this ends with e.g.

this should be replaced.

this is end of text.}.strip

a = s.split(/\n{2,}/).each do |paragraph|
  next unless paragraph.match /\.\Z/
  next if paragraph.match /(\.[a-zA-Z]|etc|\.\.)\.\Z/
  paragraph.chop!
end.join("\n\n")

>> puts a
this is a paragraph

this ends with an ellipsis...

this ends with etc.

this ends with B.I.G.

this ends with e.g.

this should be replaced

this is end of text