破解特定的文本行

时间:2014-07-14 18:52:03

标签: bash

我有一个字幕文件。我希望unbreak所有字幕。 一个例子:

1
00:02:08,315 --> 00:02:10,786
Hello Jim.
How are you?

2
00:02:10,869 --> 00:02:13,192
I'm well.
And you?

我想转换为:

1
00:02:08,315 --> 00:02:10,786
Hello Jim. How are you?

2
00:02:10,869 --> 00:02:13,192
I'm well. And you?

字幕编号和时间码不应该是unbreak。 如何用sed完成?

5 个答案:

答案 0 :(得分:3)

你可以:

awk 'BEGIN { RS = ""; FS = "\n" }
     NR > 1 { print "" }
     { print $1; print $2;
       for (i = 3; i < NF; ++i) printf "%s ", $i;
       print $NF;
     }' your_file.txt

输出:

1
00:02:08,315 --> 00:02:10,786
Hello Jim. How are you?

2
00:02:10,869 --> 00:02:13,192
I'm well. And you?

答案 1 :(得分:0)

这个小awk脚本将完成这项工作。它比需要的要复杂一点,但可以作为更高级处理的基础。也许...

awk 'BEGIN                     { state = "copy" }
     (state == "copy")         { print }
     /-->/                     { state = "text"; next }
     /.+/ && (state == "text") { printf("%s ",$0); next }
     /^$/                      { printf("\n\n"); state = "copy"; next }
     END                       { printf("\n") }
    ' < sub.txt

根据您的输入文件,这会产生:

1
00:02:08,315 --> 00:02:10,786
Hello Jim. How are you? 

2
00:02:10,869 --> 00:02:13,192
I'm well. And you? 

<小时/> 修改 在查看您作为对其他答案的评论的示例文件之后,我只能猜测您要合并连续的<i>...</i>行。所以这个简单的Perl技巧就足够了:

sh$ unzip 56939b22f5174a770a79f6b0b0cf7caaee1c9dfb.zip
Archive:  56939b22f5174a770a79f6b0b0cf7caaee1c9dfb.zip
inflating: Red.Planet.2000.1080p.REPACK.BluRay.x264-7SinS.srt  

sh$ perl -0pe 's|</i>\r\n<i>| |m' < Red.Planet.2000.1080p.REPACK.BluRay.x264-7SinS.srt

1
00:00:35,661 --> 00:00:40,792
<i>By the year 2000, we had begun to overpopulate, pollute and poison our planet...</i>

2
00:00:41,208 --> 00:00:43,176
<i>...faster than we could clean it up.</i>

3
00:00:43,377 --> 00:00:48,053
<i>We ignored the problem for as long as we could but we were kidding ourselves.</i>

答案 2 :(得分:0)

如果所有子标题栏都用空行分隔,并且您希望保留每个块的前两行并将其余部分与空间合并。然后你可以使用Perl:

perl -F'\n' -aln00e 'print "$F[0]\n$F[1]\n", (join" ",@F[2..$#F]), "\n"' myfile.txt

但是如果说出的行中有空行,则会被破坏。但我想你不会在意删除包含在口语中的空行。如果是这样,只需采取预处理步骤:

perl -lp0777e 's/\n\n+(?!\d+\n\d\d:\d\d:\d\d,\d\d\d\s*-->)/\n/g' myfile.txt

答案 3 :(得分:0)

TXR语言的解决方案:

@(repeat)
@num
@fromtime --> @totime
@(collect)
@line
@(until)

@(end)
@(output)
@num
@fromtime --> @totime
@(rep)@line @(last)@line@(end)
@(end)
@(end)

执行命令

$ txr unbreak.txr sub.srt 
1
00:02:08,315 --> 00:02:10,786
Hello Jim. How are you?
2
00:02:10,869 --> 00:02:13,192
I'm well. And you?

即使我们精确地提取了SRT文件的更多功能,也可以轻松实现所需的输出,而不是完成工作。我们可以轻松地将代码弯曲成更复杂的转换。

答案 4 :(得分:0)

此命令行也适用:

cat red.srt | tr '\012' '\040' | sed 's/[0-9]\+ ..:..:..,... --> ..:..:..,.../\n\0\n/g' | sed 's/^[0-9]\+ /\n\0\n/g' | sed 's/^ *//g; s/ \+/ /g; s/ *$//g' | sed '1,2d' > final.srt

我知道,这个解决方案并不优雅,但它对我来说非常适合。