Question

我有很多用markdown编写的帖子，我需要删除每个段落末尾的句号

markdown中段落的结尾由以下分隔：

2个或更多\n s 或
字符串的结尾

但是，有这些边缘情况

省略号
Acroynms（例如，当它落在段落的末尾时，我不想放弃“Notorious B.I.G.”的最后一段时间）。我认为你可以通过说“不要删除最后一个时期，如果它之前是一个大写字母本身之前是另一个时期”来处理这个案例。
特殊情况：e.g.，i.e.，etc.

这是一个正则表达式匹配有违规期的帖子，但它不考虑上面的（2）和（3）：

/[^.]\.(\n{2,}|\z)/

Answer 1

(?<!\.[a-zA-Z]|etc|\.\.)\.(?=\n{2,}|\Z)

(?<!\.[a-zA-Z]|etc|\.\.) - 向后看以确保句点之前没有.T，etc，..等序列（对于省略号）。
\.期间
(?=\n{2,}|\Z)预测要查找降价段落的结尾（两个换行符或字符串结尾）

测试：

s = """ths is a paragraph.

this ends with an ellipsis...

this ends with etc.

this ends with B.I.G.

this ends with e.g.

this should be replaced.

this is end of text."""
print s.gsub(/(?<!\.[a-zA-Z]|etc|\.\.)\.(?=[\n]{2,}|\Z)/, "") 
print "\n"

输出：

this is a paragraph

this ends with an ellipsis...

this ends with etc.

this ends with B.I.G.

this ends with e.g.

this should be replaced

this is end of text

Answer 2

Ruby 1.8.7兼容算法：

s = %{this is a paragraph.

this ends with an ellipsis...

this ends with etc.

this ends with B.I.G.

this ends with e.g.

this should be replaced.

this is end of text.}.strip

a = s.split(/\n{2,}/).each do |paragraph|
  next unless paragraph.match /\.\Z/
  next if paragraph.match /(\.[a-zA-Z]|etc|\.\.)\.\Z/
  paragraph.chop!
end.join("\n\n")

>> puts a
this is a paragraph

this ends with an ellipsis...

this ends with etc.

this ends with B.I.G.

this ends with e.g.

this should be replaced

this is end of text

从降价段落末尾删除句点

2 个答案: