如何匹配多个文件中的多个正则表达式模式并将某些内容写入日志文件?

时间:2016-08-28 07:12:04

标签: regex vb.net

我想搜索文件(* .txt)中的一些正则表达式模式,这些模式位于文件夹中我的路径已经给出的文件夹中,该文件夹包含其他子文件夹,其中包含12345 \ 2031格式的txt文件\ 30201 \ txt \ 120.txt,如果模式甚至在一个文件中匹配,那么一个字符串写在一个日志文件中,该文件在我在文本框中给出的路径的文件夹内创建,然后它移动到下一个正则表达式等等 到目前为止我所做的是

Dim tLoc As String = TextBox1.Text
        Dim txtFilesArray = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories).Where(Function(f) f Like "*\#*\#*\#*\txt\#*.txt")
        Dim fileLoc As String = tLoc & "\Checklist.log"
        Dim fs As FileStream = Nothing
        If (Not File.Exists(fileLoc)) Then
            fs = File.Create(fileLoc)
            Using fs

            End Using
        End If
        For Each tFile In txtFilesArray
            Dim input As String = File.ReadAllText(tFile)
            Dim pattern1 As New Regex("(?<!>)(figure|fig\.|figs\.|figures) (\d+)")
            Dim pattern2 As New Regex("(?<!>)(table|tab\.|tabs\.|tables) (\d+)")
            If pattern1.IsMatch(input) Then
                FileOpen(1, fileLoc, OpenMode.Append)
                PrintLine(1, "Check figure link")
                FileClose()
            End If
            If pattern2.IsMatch(input) Then
                FileOpen(1, fileLoc, OpenMode.Append)
                PrintLine(1, "Check table link")
                FileClose()
            End If

        Next

但问题是: 1)即使pattern1在多个文件中匹配,我希望它在日志文件中只写一次字符串检查图链接,而不是每次在不同文件中找到匹配项时同样对于 pattern2 .... patternN ,此外,我希望程序在pattern1匹配一个文件的时刻继续下一个正则表达式模式匹配(无需查找其他文件中的模式相同) 2)我想在这个程序中使用大约一百个正则表达式模式,谁能告诉我如何缩短编码?

1 个答案:

答案 0 :(得分:1)

您可以将模式放入某种集合中,然后在找到时将其从中删除

Dim re = Function(p$) New Regex(p, RegexOptions.Compiled)
Dim patterns = New Dictionary(Of String, Regex) From {
    {"Check figure link", re("(?<!>)(figure|fig\.|figs\.|figures) (\d+)")},
    {"Check table link", re("(?<!>)(table|tab\.|tabs\.|tables) (\d+)")}
}
Dim output = New List(Of String)
Dim tLoc = TextBox1.Text
Dim txtFiles = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories)

For Each tFile In txtFiles
    If Not tFile Like "*\#*\#*\#*\txt\#*.txt" Then Continue For
    Dim input = File.ReadAllText(tFile)

    Dim match = ""
    For Each pattern In patterns
        If pattern.Value.IsMatch(input) Then
            match = pattern.Key
            Exit For
        End If
    Next
    If match > "" Then
        output.Add(match)
        patterns.Remove(match)
    End If
Next
File.WriteAllLines(tLoc.TrimEnd("\"c) & "\Checklist.log", output)

如果要将每个模式与所有文件进行比较,那么并行化(在多个处理器上同时运行)会更容易,因为不需要从集合中删除它们:

Dim patterns = New List(Of String()) From {
    ({"Check figure link", "(?<!>)(figure|fig\.|figs\.|figures) (\d+)"}),
    ({"Check table link", "(?<!>)(table|tab\.|tabs\.|tables) (\d+)"})}

Parallel.ForEach(patterns,
    Sub(pattern)
        Dim tLoc = TextBox1.Text
        Dim output = New List(Of String)
        Dim txtFiles = Directory.EnumerateFiles(tLoc, "*.txt", SearchOption.AllDirectories)
        Dim regEx = New Regex(pattern(1), RegexOptions.Compiled)

        For Each tFile In txtFiles
            If tFile Like "*\#*\#*\#*\txt\#*.txt" Then
                Dim input = File.ReadAllText(tFile)
                If regEx.IsMatch(input) Then
                    output.Add(pattern(0))
                    Exit For
                End If
            End If
        Next
        File.AppendAllLines(TextBox1.Text.TrimEnd("\"c) & "\Checklist.log", output)
    End Sub)

或更短更复杂的版本

Dim patterns = New List(Of String()) From {
    ({"Check figure link", "(?<!>)(figure|fig\.|figs\.|figures) (\d+)"}),
    ({"Check table link", "(?<!>)(table|tab\.|tabs\.|tables) (\d+)"})}

Dim output = From pattern In patterns.AsParallel
             Let regEx = New Regex(pattern(1), RegexOptions.Compiled)
             From tFile In Directory.EnumerateFiles(TextBox1.Text, "*.txt", SearchOption.AllDirectories)
             Where tFile Like "*\#*\#*\#*\txt\#*.txt" AndAlso regEx.IsMatch(File.ReadAllText(tFile))
             Take 1
             Select pattern(0)

File.WriteAllLines(TextBox1.Text.TrimEnd("\"c) & "\Checklist.log", output)