VB.NET~需要帮助解析在richtextbox中加载的XML(从URL加载)

时间:2011-12-15 02:39:25

标签: xml vb.net parsing

我有两个富文本框,屏幕上有两个按钮。第一个按钮从URL抓取HTML,然后将HTML转换为XML,它位于富文本框1中。

第二个按钮是从富文本框1中获取XML,然后解析它以按ID获取所有输入元素。

我的问题是我的解析器没有做任何事情。我的猜测是,我不是从第一个富文本框中获取XML。

从富文本框中获取XML的最佳方法是将其加载到内存中,然后解析XML以获取所有ID标记?

这是我的代码 - 感谢您的帮助。

Imports mshtml
Imports System.Text
Imports System.Net
Imports System.Xml
Imports System.IO
Imports System.Xml.XPath

Public Class Scraper

    Private Sub Scraper_Load(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles MyBase.Load
    End Sub

    Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click
        '  Note: This example uses two Chilkat products: Chilkat HTTP
        '  and Chilkat HTML-to-XML.  The "Chilkat Bundle" can be licensed
        '  at a price that is less than purchasing each product individually.
        '  The "Chilkat Bundle" provides licenses to all existing Chilkat components.  Also, new-version upgrades are always free.

        Dim http As New Chilkat.Http()

        '  Any string argument automatically begins the 30-day trial.
        Dim success As Boolean
        success = http.UnlockComponent("30-day trial")
        If (success <> True) Then
            TextBox1.Text = TextBox1.Text & http.LastErrorText & vbCrLf
            Exit Sub
        End If

        Dim html As String
        html = http.QuickGetStr("http://www.quiltingboard.com/register.php")
        If (html = vbNullString) Then
            TextBox1.Text = TextBox1.Text & http.LastErrorText & vbCrLf
            Exit Sub
        End If

        Dim htmlToXml As New Chilkat.HtmlToXml()

        '  Any string argument automatically begins the 30-day trial.
        success = htmlToXml.UnlockComponent("30-day trial")
        If (success <> True) Then
            TextBox1.Text = TextBox1.Text & htmlToXml.LastErrorText & vbCrLf
            Exit Sub
        End If

        '  Indicate the charset of the output XML we'll want.
        htmlToXml.XmlCharset = "utf-8"

        '  Set the HTML:
        htmlToXml.Html = html

        '  Convert to XML:
        Dim xml As String
        xml = htmlToXml.ToXml()

        '  Save the XML to a file.
        '  Make sure your charset here matches the charset
        '  used for the XmlCharset property.
        htmlToXml.WriteStringToFile(xml, "out.xml", "utf-8")

        RichTextBox1.Text = xml
    End Sub

    Private Sub LoopThroughXmlDoc(ByVal nodeList As XmlNodeList)
        For Each elem As XmlElement In nodeList
            If elem.HasChildNodes Then
                LoopThroughXmlDoc(elem.ChildNodes)
            Else
                '' Extract the information
                If elem.HasAttribute("id") Then
                    'elem.Attributes("AssetID").Value.ToString()
                ElseIf elem.HasAttribute("name") Then
                    'elem.Attributes("AttributeID").Value.ToString()
                End If
            End If
        Next
    End Sub

    Private Sub Button2_Click_1(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click
        Dim doc As XmlDocument = New XmlDocument
        doc.Load("xmlFile.xml")
        Dim nodeList As XmlNodeList = doc.GetElementsByTagName("input")
        LoopThroughXmlDoc(nodeList)
    End Sub
End Class

1 个答案:

答案 0 :(得分:0)

第二个按钮不会从RichTextBox中获取XML,它会尝试从xmlFile.xml加载它。

此文件与button1中保存的文件不同,后者是out.xml。

如果用户可以更改richtextbox中的XML,那么解决方案是更改button2中的代码以从RTB中检索文本。

否则,解决方案是将button2中正在读取的文件的名称更改为out.xml。