将列表与列表进行比较,过滤/忽略时间戳

时间:2013-08-29 23:04:13

标签: vb.net

我正在尝试创建一个程序来验证文件B(可能是坏的)对文件A(已知的好)的内容,并从潜在的坏文件中删除每个已知良好的行,并只留下潜在的坏行。我遇到的问题是每行包含一个时间戳。如何在时间戳之后验证行的内容?

即。文件A:

MSI (c) (74:80) [08:09:43:718]: Resetting cached policy values
MSI (c) (74:80) [08:09:43:718]: Machine policy value 'Debug' is 0
MSI (c) (74:80) [08:09:43:718]: ******* RunEngine:

与文件B对比:

MSI (c) (E8:DC) [18:35:18:573]: Resetting cached policy values
MSI (c) (E8:DC) [18:35:18:573]: Machine policy value 'Debug' is 0
MSI (c) (E8:DC) [18:35:18:573]: ******* RunEngine:

这些都应该被认为是平等的。 我没有一个不同的例子,但它本质上是一旦被删除就留下的任何东西。

到目前为止我的代码:

Public Class Form1
Dim compto As New List(Of String)
Dim compfrom As New List(Of String)

Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
    Standard("filea.LOG")
    Readfile("fileb.LOG")
    Writefile("difference.txt")
End Sub


Public Sub Standard(ByVal Path As String)
    Using r As StreamReader = New StreamReader(Path)
        Dim line As String = Nothing
        line = r.ReadLine
        Do While (Not line Is Nothing)
            line = r.ReadLine
            If Not compto.Contains(line) Then compto.Add(line)
        Loop
    End Using
End Sub

Public Sub Readfile(ByVal Path As String)
    Dim pattern As String = "{30}([A-Za-z0-9\-]+"
    Using r As StreamReader = New StreamReader(Path)
        Dim line As String = Nothing
        line = r.ReadLine
        Do While (Not line Is Nothing)
            line = r.ReadLine
            If Not compto.Contains(line) Then compfrom.Add(line)
        Loop
    End Using
End Sub

Public Sub Writefile(ByVal Path As String)
    Using sw As StreamWriter = New StreamWriter(Path)
        For Each line As String In compfrom
            sw.WriteLine(line)
            ListBox1.Items.Add(line)
        Next
    End Using
End Sub

End Class

到目前为止,此代码将删除完全匹配,但这是我被卡住的地方。任何帮助都会很棒。

感谢。

解决方案编辑:

Public Sub Writefile(ByVal Path As String)
    Dim GetLine As Func(Of String, String) = Function(line) Regex.Match(line, "\]: (.*)").Groups(1).Value
    Dim Diff As New HashSet(Of String)(File.ReadLines("filea.log").Select(GetLine))
    Diff.SymmetricExceptWith(File.ReadLines("fileb.log").Select(GetLine))
    Using sw As StreamWriter = New StreamWriter(Path)
        For Each line As String In Diff
            sw.WriteLine(String.Join("", line))
            ListBox1.Items.Add(String.Join("", line))
        Next
    End Using
End Sub

2 个答案:

答案 0 :(得分:2)

根据this链接,试试这个:

Dim GetLine As Func(Of String,String) = Function(line) Regex.Match(line,"\]: (.*)").Groups(1).Value

'IF the timestamp is always at the same position, it may be more efficient to 
'avoid regular expressions. YMMV
GetLine = Function(line) line.Substring(32)

Dim Diff = New HashSet(File.ReadLines("filea.LOG").Select(GetLine))
Diff.SymmetricExceptWith(File.ReadLines("fileb.LOG").Select(GetLine))

答案 1 :(得分:1)

您似乎正在将File A中的每个唯一行与File B中的每一行进行比较,并且行标题MSI (c) (74:80) [08:09:43:718]:与此比较无关,并且它是恒定长度。

您可以更改代码(4个实例):

line = r.ReadLine

为:

line = r.ReadLine.Substring(32)
带有一个整数参数的

Substring()返回从指定字符位置开始的字符串的剩余部分。