我有很多用markdown编写的帖子,我需要删除每个段落末尾的句号
markdown中段落的结尾由以下分隔:
\n
s 或 但是,有这些边缘情况
e.g.
,i.e.
,etc.
这是一个正则表达式匹配有违规期的帖子,但它不考虑上面的(2)和(3):
/[^.]\.(\n{2,}|\z)/
答案 0 :(得分:1)
(?<!\.[a-zA-Z]|etc|\.\.)\.(?=\n{2,}|\Z)
(?<!\.[a-zA-Z]|etc|\.\.)
- 向后看以确保句点之前没有.T
,etc
,..
等序列(对于省略号)。\.
期间(?=\n{2,}|\Z)
预测要查找降价段落的结尾(两个换行符或字符串结尾)测试:
s = """ths is a paragraph.
this ends with an ellipsis...
this ends with etc.
this ends with B.I.G.
this ends with e.g.
this should be replaced.
this is end of text."""
print s.gsub(/(?<!\.[a-zA-Z]|etc|\.\.)\.(?=[\n]{2,}|\Z)/, "")
print "\n"
输出:
this is a paragraph
this ends with an ellipsis...
this ends with etc.
this ends with B.I.G.
this ends with e.g.
this should be replaced
this is end of text
答案 1 :(得分:0)
Ruby 1.8.7兼容算法:
s = %{this is a paragraph.
this ends with an ellipsis...
this ends with etc.
this ends with B.I.G.
this ends with e.g.
this should be replaced.
this is end of text.}.strip
a = s.split(/\n{2,}/).each do |paragraph|
next unless paragraph.match /\.\Z/
next if paragraph.match /(\.[a-zA-Z]|etc|\.\.)\.\Z/
paragraph.chop!
end.join("\n\n")
>> puts a
this is a paragraph
this ends with an ellipsis...
this ends with etc.
this ends with B.I.G.
this ends with e.g.
this should be replaced
this is end of text