我从字幕文件(.srt)创建了一个列表,其中每一行都在列表的索引中。现在我想删除一些行,特别是以“xx:xx:xx,xxx - > xx:xx:xx,xxx”格式的时间间隔开始的行。我做了一些研究(好吧,对一个复杂的话题进行非常肤浅的研究,我可能会补充)并尝试创建以下子目录:
Private Shared Sub listCleaning(ByRef sList As List(Of String))
For Each line As String In sList
Dim pattern As String = "\b\d\d:\d\d:\d\d:\d\d,\d\d\d --> \b\d\d:\d\d:\d\d:\d\d,\d\d\d"
Dim reg As New Regex(pattern)
If line = "" Or Integer.TryParse(line, Nothing) Or reg.IsMatch(pattern) Then
sList.Remove(line)
End If
Next
End Sub
现在我有两个问题:
任何人都可以帮我写一个正确的RegEx和一种迭代列表去除不需要的行的方法吗?
提前tnx。编辑:
好的,Tim解决了大部分问题,但我仍然需要一个适合“xx:xx:xx,xxx - > xx:xx:xx,xxx”模式的RegEx。有人愿意关心吗?
提前感谢!
答案 0 :(得分:1)
这些是我对原始代码的更改。
首先,我不使用for循环,但更传统的for循环
第二,循环反向,所以你的删除到列表的末尾
第三,正则表达式模式应该在循环之外
Private Shared Sub listCleaning(ByRef sList As List(Of String))
Dim pattern As String = "\d{2}:\d{2}:\d{2},\d{3}\s+-->\s+\d{2}:\d{2}:\d{2},\d{3}"
Dim reg As New Regex(pattern)
Dim x as Integer
For x = sList.Count - 1 to 0 step -1
Dim line as string = sList(x)
Console.WriteLine(line)
If line = "" Or Integer.TryParse(line, Nothing) Or reg.IsMatch(line) Then
sList.Remove(line)
End If
Next
End Sub
这是我的测试数据:
Sub Main
Dim sList as List(Of String) = new List(Of string)
sList.Add("01:01:01,003 --> 02:02:02,003")
sList.Add("sdsdfsdfsd03 --> 02:02:02,003")
sList.Add("03:01:01,003 --> 03:02:02,003")
sList.Add("04:01:01,003 --> 04:02:02,003")
sList.Add("05:01:01,003 --> 05:02:02,003")
sList.Add("06:01:01,003 --> 06:02:02,003")
sList.Add("07:01:01,003 --> 07:02:02,003")
sList.Add("08:01:01,003 --> 08:02:02,003")
sList.Add("09:01:01,003 --> 02:02:02 003")
console.WriteLine("Call listCleaning with " + sList.Count.ToString + " elements")
listCleaning(sList)
console.WriteLine("Returned with " + sList.Count.ToString + " elements")
for each line as String in sList
Console.WriteLine(line)
next
End Sub
获得此输出
Call listCleaning with 9 elements
09:01:01,003 --> 02:02:02 003
08:01:01,003 --> 02:02:02,003
07:01:01,003 --> 02:02:02,003
06:AA:01,003 --> 02:02:02,003
05:01:01,003 --> 02:02:02,003
04:01:01,003 --> 02:02:02,003
03:01:01,003 --> 02:02:02,003
sdsdfsdfsd03 --> 02:02:02,003
01:01:01,003 --> 02:02:02,003
Returned with 3 elements
sdsdfsdfsd03 --> 02:02:02,003
06:AA:01,003 --> 02:02:02,003
09:01:01,003 --> 02:02:02 003
答案 1 :(得分:0)
回答第2点:
您无法在迭代期间修改集合。所以你
List.RemoveAll
(查看底部)或使用Linq:
var dontRemove = From line In sList
Where line <> "" AndAlso Not Integer.TryParse(line, Nothing) AndAlso Not reg.IsMatch(pattern)
现在您可以安全地从列表中删除这些行,或者只创建一个新列表:
sList = dontRemove.ToList()
如果您使用List(Of T)
,最好的选择是使用List.RemoveAll
并传递谓词应删除哪些项目:
Dim regex = New Regex("\d{2}:\d{2}:\d{2},\d{3}\s+-->\s+\d{2}:\d{2}:\d{2},\d{3}", RegexOptions.Compiled)
sList.RemoveAll(Function(line) line.Length = 0 _
OrElse Integer.TryParse(line, Nothing) _
OrElse Not regex.IsMatch(line))
对于List
反向迭代的最佳方法是什么
For index As Int32 = sLines.Count - 1 To 0 Step -1
Dim line = sLines(index)
Next`