我想搜索文件(* .txt)中的一些正则表达式模式,这些模式位于文件夹中我的路径已经给出的文件夹中,该文件夹包含其他子文件夹,其中包含12345 \ 2031格式的txt文件\ 30201 \ txt \ 120.txt,如果模式甚至在一个文件中匹配,那么一个字符串写在一个日志文件中,该文件在我在文本框中给出的路径的文件夹内创建,然后它移动到下一个正则表达式等等 到目前为止我所做的是
Dim tLoc As String = TextBox1.Text
Dim txtFilesArray = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories).Where(Function(f) f Like "*\#*\#*\#*\txt\#*.txt")
Dim fileLoc As String = tLoc & "\Checklist.log"
Dim fs As FileStream = Nothing
If (Not File.Exists(fileLoc)) Then
fs = File.Create(fileLoc)
Using fs
End Using
End If
For Each tFile In txtFilesArray
Dim input As String = File.ReadAllText(tFile)
Dim pattern1 As New Regex("(?<!>)(figure|fig\.|figs\.|figures) (\d+)")
Dim pattern2 As New Regex("(?<!>)(table|tab\.|tabs\.|tables) (\d+)")
If pattern1.IsMatch(input) Then
FileOpen(1, fileLoc, OpenMode.Append)
PrintLine(1, "Check figure link")
FileClose()
End If
If pattern2.IsMatch(input) Then
FileOpen(1, fileLoc, OpenMode.Append)
PrintLine(1, "Check table link")
FileClose()
End If
Next
但问题是:
1)即使pattern1
在多个文件中匹配,我希望它在日志文件中只写一次字符串检查图链接,而不是每次在不同文件中找到匹配项时同样对于 pattern2 .... patternN ,此外,我希望程序在pattern1
匹配一个文件的时刻继续下一个正则表达式模式匹配(无需查找其他文件中的模式相同)
2)我想在这个程序中使用大约一百个正则表达式模式,谁能告诉我如何缩短编码?
答案 0 :(得分:1)
您可以将模式放入某种集合中,然后在找到时将其从中删除
Dim re = Function(p$) New Regex(p, RegexOptions.Compiled)
Dim patterns = New Dictionary(Of String, Regex) From {
{"Check figure link", re("(?<!>)(figure|fig\.|figs\.|figures) (\d+)")},
{"Check table link", re("(?<!>)(table|tab\.|tabs\.|tables) (\d+)")}
}
Dim output = New List(Of String)
Dim tLoc = TextBox1.Text
Dim txtFiles = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories)
For Each tFile In txtFiles
If Not tFile Like "*\#*\#*\#*\txt\#*.txt" Then Continue For
Dim input = File.ReadAllText(tFile)
Dim match = ""
For Each pattern In patterns
If pattern.Value.IsMatch(input) Then
match = pattern.Key
Exit For
End If
Next
If match > "" Then
output.Add(match)
patterns.Remove(match)
End If
Next
File.WriteAllLines(tLoc.TrimEnd("\"c) & "\Checklist.log", output)
如果要将每个模式与所有文件进行比较,那么并行化(在多个处理器上同时运行)会更容易,因为不需要从集合中删除它们:
Dim patterns = New List(Of String()) From {
({"Check figure link", "(?<!>)(figure|fig\.|figs\.|figures) (\d+)"}),
({"Check table link", "(?<!>)(table|tab\.|tabs\.|tables) (\d+)"})}
Parallel.ForEach(patterns,
Sub(pattern)
Dim tLoc = TextBox1.Text
Dim output = New List(Of String)
Dim txtFiles = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories)
Dim regEx = New Regex(pattern(1), RegexOptions.Compiled)
For Each tFile In txtFiles
If tFile Like "*\#*\#*\#*\txt\#*.txt" Then
Dim input = File.ReadAllText(tFile)
If regEx.IsMatch(input) Then
output.Add(pattern(0))
Exit For
End If
End If
Next
File.AppendAllLines(TextBox1.Text.TrimEnd("\"c) & "\Checklist.log", output)
End Sub)
或更短更复杂的版本
Dim patterns = New List(Of String()) From {
({"Check figure link", "(?<!>)(figure|fig\.|figs\.|figures) (\d+)"}),
({"Check table link", "(?<!>)(table|tab\.|tabs\.|tables) (\d+)"})}
Dim output = From pattern In patterns.AsParallel
Let regEx = New Regex(pattern(1), RegexOptions.Compiled)
From tFile In Directory.EnumerateFiles(TextBox1.Text, "*.txt", SearchOption.AllDirectories)
Where tFile Like "*\#*\#*\#*\txt\#*.txt" AndAlso regEx.IsMatch(File.ReadAllText(tFile))
Take 1
Select pattern(0)
File.WriteAllLines(TextBox1.Text.TrimEnd("\"c) & "\Checklist.log", output)