我有一些带有强硬换行符的文字,如下所示:
This should all be on one line
since it's one sentence.
This is a new paragraph that
should be separate.
我想删除单个换行符,但保留双换行符,如下所示:
This should all be on one line since it's one sentence.
This is a new paragraph that should be separate.
是否有一个正则表达式来执行此操作? (或一些简单的方法)
到目前为止,这是我唯一有效的解决方案,但感觉很自负。
txt = txt.gsub(/(\r\n|\n|\r)/,'[[[NEWLINE]]]')
txt = txt.gsub('[[[NEWLINE]]][[[NEWLINE]]]', "\n\n")
txt = txt.gsub('[[[NEWLINE]]]', " ")
答案 0 :(得分:9)
替换所有未在换行符之后或之前的换行符:
text = <<END
This should all be on one line
since it's one sentence.
This is a new paragraph that
should be separate.
END
p text.gsub /(?<!\n)\n(?!\n)/, ' '
#=> "This should all be on one line since it's one sentence.\n\nThis is a new paragraph that should be separate. "
或者,对于没有外观的Ruby 1.8:
txt.gsub! /([^\n])\n([^\n])/, '\1 \2'
答案 1 :(得分:3)
text.gsub!(/(\S)[^\S\n]*\n[^\S\n]*(\S)/, '\1 \2')
两个(\S)
组与@ sln的正则表达式中的外观((?<!\s)(?<!^)
和(?!\s)(?!$)
)具有相同的用途:
[^\S\n]*\n[^\S\n]*
部分消耗换行周围的任何其他空白,使我们可以将其标准化为单个空格。它们还使正则表达式更容易阅读,并且(可能最重要的是)它们在1.9版本的Ruby中工作,不支持lookbehinds。
答案 2 :(得分:1)
格式化(关闭自动换行)比你想象的更多 如果输出是格式化操作的结果,那么你应该去 那些对原件进行逆向工程的规则。
例如,你在那里进行的测试是
This should all be on one line
since it's one sentence.
This is a new paragraph that
should be separate.
如果仅删除了单个换行符,它将如下所示:
This should all be on one line since it's one sentence.
This is a new paragraph thatshould be separate.
此外,其他格式(如故意换行)也会丢失,例如:
This is Chapter 1
Section a
Section b
变成
This is Chapter 1 Section a Section b
查找有问题的换行符很简单/(?<!\n)\n(?!\n)/
但是,你用什么来代替呢。
编辑:实际上,即使找到独立的换行也不容易,因为它们在视觉上位于隐藏的视图(水平)空白之间。
有4种方法可供选择。
删除换行符,保留周围的格式
$text =~ s/(?<!\s)([^\S\n]*)\n([^\S\n]*)(?!\s)/$1$2/g;
删除换行符和格式,替换空格
$text =~ s/(?<!\s)[^\S\n]*\n[^\S\n]*(?!\s)/ /g;
与上述相同但忽略字符串
开头或结尾的换行符 $text =~ s/(?<!\s)(?<!^)[^\S\n]*\n[^\S\n]*(?!$|\s)/ /g;
$text =~ s/(?<!\s)(?<!^)([^\S\n]*)\n([^\S\n]*)(?!$|\s)/$1$2/g;
正则表达式的示例细分(这是隔离单个换行符所需的最小值):
(?<!\s) # Not a whitespace behind us (text,number,punct, etc..)
[^\S\n]* # 0 or more whitespaces, but no newlines
\n # a newline we want to remove
[^\S\n]* # 0 or more whitespaces, but no newlines
(?!\s)/ # Not a whitespace in front of us (text,number,punct, etc..)
答案 3 :(得分:0)
嗯,有这个:
s.gsub /([^\n])\n([^\n])/, '\1 \2'
它不会对领先或尾随换行做任何事情。如果你根本不需要前导或尾随空格,那么你将获得这种变化:
s.gsub(/([^\n])\n([^\n])/, '\1 \2').strip
答案 4 :(得分:0)
$ ruby -00 -pne 'BEGIN{$\="\n\n"};$_.gsub!(/\n+/,"\0")' file
This should all be on one line since it's one sentence.
This is a new paragraph thatshould be separate.