我们在linux中处理很多srt文件以生成派生词,但是其中一些有 ctrl - M 字符,因为它们是在windows中生成的。现在我把两个命令检查并取出隐藏的字符
tr -d '\015' <${file}.srt >${file}.srt
awk '/^$/{ if (! blank++) print; next } { blank=0; print }' ${file}.srt | tee ${file}.srt
但我仍然有srt文件滑过命令,但仍然有 ctrl - M 字符。在这种情况下,有没有人有解决办法只在每条细微的线条之间保持空行?所以如果预处理的srt文件看起来像
1
00:00:05,569 --> 00:00:07,569
Welcome to this overview of ShareStream,
2
00:00:07,820 --> 00:00:11,940
which is a new digital streaming service
from Information Technology Services
3
00:00:11,940 --> 00:00:13,740
at the University of Iowa.
取出ctrl-M字符后的或额外的空格行应
1
00:00:05,569 --> 00:00:07,569
Welcome to this overview of ShareStream,
2
00:00:07,820 --> 00:00:11,940
which is a new digital streaming service
from Information Technology Services
3
00:00:11,940 --> 00:00:13,740
at the University of Iowa.
感谢任何帮助!
答案 0 :(得分:1)
用于删除这些行结束控件-Miss的UNIX命令是
dos2unix
用于将记录之间的多个空行挤压到一个空行的UNIX命令是:
awk -v RS= -v ORS='\n\n' '1'