我有一个字幕文件。我希望unbreak所有字幕。 一个例子:
1
00:02:08,315 --> 00:02:10,786
Hello Jim.
How are you?
2
00:02:10,869 --> 00:02:13,192
I'm well.
And you?
我想转换为:
1
00:02:08,315 --> 00:02:10,786
Hello Jim. How are you?
2
00:02:10,869 --> 00:02:13,192
I'm well. And you?
字幕编号和时间码不应该是unbreak。 如何用sed完成?
答案 0 :(得分:3)
你可以:
awk 'BEGIN { RS = ""; FS = "\n" }
NR > 1 { print "" }
{ print $1; print $2;
for (i = 3; i < NF; ++i) printf "%s ", $i;
print $NF;
}' your_file.txt
输出:
1
00:02:08,315 --> 00:02:10,786
Hello Jim. How are you?
2
00:02:10,869 --> 00:02:13,192
I'm well. And you?
答案 1 :(得分:0)
这个小awk
脚本将完成这项工作。它比需要的要复杂一点,但可以作为更高级处理的基础。也许...
awk 'BEGIN { state = "copy" }
(state == "copy") { print }
/-->/ { state = "text"; next }
/.+/ && (state == "text") { printf("%s ",$0); next }
/^$/ { printf("\n\n"); state = "copy"; next }
END { printf("\n") }
' < sub.txt
根据您的输入文件,这会产生:
1
00:02:08,315 --> 00:02:10,786
Hello Jim. How are you?
2
00:02:10,869 --> 00:02:13,192
I'm well. And you?
<小时/> 修改强> 在查看您作为对其他答案的评论的示例文件之后,我只能猜测您要合并连续的
<i>...</i>
行。所以这个简单的Perl技巧就足够了:
sh$ unzip 56939b22f5174a770a79f6b0b0cf7caaee1c9dfb.zip
Archive: 56939b22f5174a770a79f6b0b0cf7caaee1c9dfb.zip
inflating: Red.Planet.2000.1080p.REPACK.BluRay.x264-7SinS.srt
sh$ perl -0pe 's|</i>\r\n<i>| |m' < Red.Planet.2000.1080p.REPACK.BluRay.x264-7SinS.srt
1
00:00:35,661 --> 00:00:40,792
<i>By the year 2000, we had begun to overpopulate, pollute and poison our planet...</i>
2
00:00:41,208 --> 00:00:43,176
<i>...faster than we could clean it up.</i>
3
00:00:43,377 --> 00:00:48,053
<i>We ignored the problem for as long as we could but we were kidding ourselves.</i>
答案 2 :(得分:0)
如果所有子标题栏都用空行分隔,并且您希望保留每个块的前两行并将其余部分与空间合并。然后你可以使用Perl:
perl -F'\n' -aln00e 'print "$F[0]\n$F[1]\n", (join" ",@F[2..$#F]), "\n"' myfile.txt
但是如果说出的行中有空行,则会被破坏。但我想你不会在意删除包含在口语中的空行。如果是这样,只需采取预处理步骤:
perl -lp0777e 's/\n\n+(?!\d+\n\d\d:\d\d:\d\d,\d\d\d\s*-->)/\n/g' myfile.txt
答案 3 :(得分:0)
TXR语言的解决方案:
@(repeat)
@num
@fromtime --> @totime
@(collect)
@line
@(until)
@(end)
@(output)
@num
@fromtime --> @totime
@(rep)@line @(last)@line@(end)
@(end)
@(end)
执行命令
$ txr unbreak.txr sub.srt
1
00:02:08,315 --> 00:02:10,786
Hello Jim. How are you?
2
00:02:10,869 --> 00:02:13,192
I'm well. And you?
即使我们精确地提取了SRT文件的更多功能,也可以轻松实现所需的输出,而不是完成工作。我们可以轻松地将代码弯曲成更复杂的转换。
答案 4 :(得分:0)
此命令行也适用:
cat red.srt | tr '\012' '\040' | sed 's/[0-9]\+ ..:..:..,... --> ..:..:..,.../\n\0\n/g' | sed 's/^[0-9]\+ /\n\0\n/g' | sed 's/^ *//g; s/ \+/ /g; s/ *$//g' | sed '1,2d' > final.srt
我知道,这个解决方案并不优雅,但它对我来说非常适合。