VBScript通过重新编号从字幕文件中删除重复的数字

时间:2017-12-18 08:51:20

标签: regex replace vbscript pattern-matching subtitle

我的副标题(.srt)文件如下所示:

2
00:04:22,504 --> 00:04:23,520
Hello?

3
00:04:27,860 --> 00:04:29,112
Hey wait!
Hello!

3
00:06:18,860 --> 00:06:21,112
Uhh!

3
00:06:29,860 --> 00:06:32,112
Ah!

4
00:07:19,232 --> 00:07:21,284
What are you doing here?

5
00:07:21,608 --> 00:07:22,708
Tell me!

...

正如您所看到的,3在该文件中重复了三次,我想通过重新编号整个字幕文件来替换它(因为我猜这是唯一的选项,因为这个重复是多个此文件中的位置)。

我创建了以下脚本来选择该文件,并尝试用新生成的新数字(迭代次数)替换重复的数字,但它不起作用。

Dim strFile, objFS

strFile = SelectFile( )
If strFile = "" Then
    WScript.Echo "No file selected."
End If


Function SelectFile( )
    Dim objExec, strMSHTA, wshShell

    SelectFile = ""

    strMSHTA = "mshta.exe ""about:" & "<" & "input type=file id=FILE>" _
             & "<" & "script>FILE.click();new ActiveXObject('Scripting.FileSystemObject')" _
             & ".GetStandardStream(1).WriteLine(FILE.value);close();resizeTo(0,0);" & "<" & "/script>"""

    Set wshShell = CreateObject( "WScript.Shell" )
    Set objExec = wshShell.Exec( strMSHTA )

    SelectFile = objExec.StdOut.ReadLine( )

    Set objExec = Nothing
    Set wshShell = Nothing
End Function

Set objFS = CreateObject("Scripting.FileSystemObject")
Set objFile = objFS.OpenTextFile(strFile)
Set objFile2 = objFS.OpenTextFile(strFile, 8, True)
x = 0
Do Until objFile.AtEndOfStream
    strLine = objFile.ReadLine
    Set objRegEx = CreateObject("VBScript.RegExp")
    objRegEx.Global = True
    objRegEx.Pattern = "^\d+$"
    Set colMatches = objRegEx.Execute(strLine)
    If colMatches.Count > 0 Then
        x = x + 1
        strLine = x
        strNewLine = Replace(strLine,strLine,x)
        objFile2.WriteLine strLine
    End If
Loop

任何人都可以帮忙,搞清楚,如何使这项工作?

2 个答案:

答案 0 :(得分:1)

在VBScript中使用带有regular expressionreplacement function和全局计数器:

f = "C:\path\to\your.srt"
n = 1  'global counter

Function Renumber(m, g1, g2, pos, src)
  Renumber = g1 & n & g2
  n = n + 1  'increment global counter after current value was used
End Function

Set re = New RegExp
re.Pattern = "(^|\r\n\r\n)\d+(\r\n)"
re.Global = True

Set fso = CreateObject("Scripting.FileSystemObject")
txt = fso.OpenTextFile(f).ReadAll
txt = re.Replace(txt, GetRef("Renumber"))
fso.OpenTextFile(f, 2).Write txt

答案 1 :(得分:0)

如果您有Unix框或Unix虚拟机,或者您可以使用awk模拟Unix环境,则可以在一行中完成:

<强>命令:

awk 'BEGIN{c=1} $0~/^[0-9]+$/ {print c++} $0~/[a-zA-Z,:\-!?]|^$/{print}' input_sub.txt > output_sub.txt

经过测试:

2
00:04:22,504 --> 00:04:23,520
Hello?

3
00:04:27,860 --> 00:04:29,112
Hey wait!
Hello!

3
00:06:18,860 --> 00:06:21,112
Uhh!

3
00:06:29,860 --> 00:06:32,112
Ah!

4
00:07:19,232 --> 00:07:21,284
What are you doing here?

5
00:07:21,608 --> 00:07:22,708
Tell me!

<强>输出:

1
00:04:22,504 --> 00:04:23,520
Hello?

2
00:04:27,860 --> 00:04:29,112
Hey wait!
Hello!

3
00:06:18,860 --> 00:06:21,112
Uhh!

4
00:06:29,860 --> 00:06:32,112
Ah!

5
00:07:19,232 --> 00:07:21,284
What are you doing here?

6
00:07:21,608 --> 00:07:22,708
Tell me!