使用vb.net

时间:2015-06-25 16:26:28

标签: html .net vb.net parsing

我试图从一个大的html文件中获取三个值。我以为我可以使用子串方法,但被告知数据的位置可能会改变。基本上,在下面的代码中,我需要选择"记录总数:106","导入的记录数:106"和"被拒绝的记录数:0& #34;

<B>Total number of records : </B>106</Font><br><Font face="arial" size="2"><B>Number of records imported : </B>106</Font><br><Font face="arial" size="2"><B>Number of records rejected : </B>0</Font>

我希望这很清楚。提前谢谢!

1 个答案:

答案 0 :(得分:1)

IndexOf()Substring()这样的简单字符串操作应该足以完成这项工作。正则表达式将是另一种减少代码的方法(如果HTML标签可以变化,可以允许更多的灵活性),但正如Mark Twain所说,我没有时间进行简短的解决方案,所以我写了一个很长的。

一般来说,通过向您展示至少首先做出合理的尝试并显示您遇到困难的位置,您将在此处获得更好的结果。但是这个时候......你走了。 : - )

Private Shared Function GetMatchingCount(allInputText As String, textBefore As String, textAfter As String) As Integer?

    'Find the first occurrence of the text before the desired number
    Dim startPosition As Integer = allInputText.IndexOf(textBefore)

    'If text before was not found, return Nothing
    If startPosition < 0 Then Return Nothing

    'Move the start position to the end of the text before, rather than the beginning.
    startPosition += textBefore.Length

    'Find the first occurrence of text after the desired number
    Dim endPosition As Integer = allInputText.IndexOf(textAfter, startPosition)

    'If text after was not found, return Nothing
    If endPosition < 0 Then Return Nothing

    'Get the string found at the start and end positions
    Dim textFound As String = allInputText.Substring(startPosition, endPosition - startPosition)

    'Try converting the string found to an integer
    Try
        Return CInt(textFound)
    Catch ex As Exception
        Return Nothing
    End Try
End Function

当然,只有前后的文本始终相同时才会起作用。如果你使用这样的驱动程序控制台应用程序(但没有Shared,因为它在Module之后)...

Sub Main()
    Dim allText As String = "<B>Total number of records : </B>106</Font><br><Font face=""arial"" size=""2""><B>Number of records imported : </B>106</Font><br><Font face=""arial"" size=""2""><B>Number of records rejected : </B>0</Font>"""""

    Dim totalRecords As Integer? = GetMatchingCount(allText, "<B>Total number of records : </B>", "<")
    Dim recordsImported As Integer? = GetMatchingCount(allText, "<B>Number of records imported : </B>", "<")
    Dim recordsRejected As Integer? = GetMatchingCount(allText, "<B>Number of records rejected : </B>", "<")

    Console.WriteLine("Total: {0}", totalRecords)
    Console.WriteLine("Imported: {0}", recordsImported)
    Console.WriteLine("Rejected: {0}", recordsRejected)
    Console.ReadKey()
End Sub

...你会得到这样的输出:

  

总计:106

     

导入:106

     

拒绝:0