通过删除ctrl-m字符和双空行保留srt文件中的单个空格行

时间:2018-05-03 20:00:06

标签: linux awk text-processing srt

我们在linux中处理很多srt文件以生成派生词,但是其中一些有 ctrl - M 字符,因为它们是在windows中生成的。现在我把两个命令检查并取出隐藏的字符

tr -d '\015' <${file}.srt >${file}.srt

awk '/^$/{ if (! blank++) print; next } { blank=0; print }'  ${file}.srt | tee ${file}.srt

但我仍然有srt文件滑过命令,但仍然有 ctrl - M 字符。在这种情况下,有没有人有解决办法只在每条细微的线条之间保持空行?所以如果预处理的srt文件看起来像

1
00:00:05,569 --> 00:00:07,569
Welcome to this overview of ShareStream, 


2
00:00:07,820 --> 00:00:11,940
which is a new digital streaming service
from Information Technology Services


3
00:00:11,940 --> 00:00:13,740
at the University of Iowa.
取出ctrl-M字符后的

或额外的空格行应

1
00:00:05,569 --> 00:00:07,569
Welcome to this overview of ShareStream, 

2
00:00:07,820 --> 00:00:11,940
which is a new digital streaming service
from Information Technology Services

3
00:00:11,940 --> 00:00:13,740
at the University of Iowa.

感谢任何帮助!

1 个答案:

答案 0 :(得分:1)

用于删除这些行结束控件-Miss的UNIX命令是

dos2unix

用于将记录之间的多个空行挤压到一个空行的UNIX命令是:

awk -v RS= -v ORS='\n\n' '1'