我的许多文档都是使用LaTeX编写的,如果格式正确,则可以使用分布式工作流和版本控制。具体来说,我喜欢用每行一个句子来格式化文本。
我的问题是我有一些遗留文件要转换,不遵循这种格式化政策,我想以自动方式转换它们。我认为sed
和/或awk
的某种组合应该很简单,但我遇到了一些麻烦。
我正在尝试转换
This is some unformatted
text that does not have a sentence on one line.
This is a new unformatted paragraph
that does not follow the rule either.
This line \\ has a break in it.
到
This is some unformatted text that does not have a sentence on one line.
This is a new unformatted paragraph that does not follow the rule either.
This line \\
has a break in it.
我到目前为止的sed
/ awk
如下:
awk ' /^$/ { print "\n"; } /./ { printf("%s", $0); } END { print; } ' <filename> | sed -e $'s/\. /\.\\\n/g'
这让我大部分都在那里,但是我无法获得\\
后跟换行字符才能正常工作。
非常感谢您的帮助。
答案 0 :(得分:1)
<强>输入强>
$ cat text
This is some unformatted
text that does not have a sentence on one line.
This is a new unformatted paragraph
that does not follow the rule either.
This line \\ has a break in it.
This line too \\ contains break.
This is a normal line.
<强>脚本强>
$ awk 'BEGIN{RS=".";}
{$0=gensub(/([[:print:]?])\n/,"\\1 ","g");
$0=gensub(/(\\\\) /,"\\1\n","g");
printf "%s.",$0}
END{printf "\n"}' text
<强>输出强>
This is some unformatted text that does not have a sentence on one line.
This is a new unformatted paragraph that does not follow the rule either.
This line \\
has a break in it.
This line too \\
contains break.
This is a normal line .
注意:这假设你有gnu-awk。
答案 1 :(得分:1)
$ awk -v RS= -v ORS='\n\n' -F'\\\\\\\\[[:space:]]*' -v OFS='\n' '{gsub(/\n/," "); $1=$1}1' file
This is some unformatted text that does not have a sentence on one line.
This is a new unformatted paragraph that does not follow the rule either.
This line
has a break in it.