我有一些需要清理的草率文字。不知何故,在段落的中间插入了随机换行符。
This is a paragraph
and it got broken into two lines.
处理此问题的手动方式是
有没有办法通过查找和替换来实现这一目标?我知道我可以用^[a-z]
找到有问题的行并检查“区分大小写”,但这是我能得到的。
我刚刚开始学习模式匹配的强大功能,并且我已经解决了所有其他清理问题,但这个问题仍困扰着我。
答案 0 :(得分:0)
在awk
linux
cat file
This is a paragraph
and it got broken into two lines.
This line is fine and should be printed.
Here is another
that has been broken.
awk 'NR>1 {printf "%s"(substr($0,1,1)~/^[[:lower:]]$/?FS:RS),a} {a=$0} END {print a}' file
This is a paragraph and it got broken into two lines.
This line is fine and should be printed.
Here is another that has been broken.
答案 1 :(得分:0)
如果真的没有什么可以处理的,请搜索\n([a-z])
(与找到“匹配”“区分大小写”和“Grep” “启用)并替换为\1
。 (搜索表达式没有前导空格,而替换确实有一个。)