提取html元素属性

时间:2013-09-16 17:02:40

标签: html vb.net visual-studio-2008

我是VB 2008.net的新手,我想做的是从下面的html中提取一些元素

<TABLE>
  <TR>
    <TD></TD>
    <TH scope="col">PAT. NO.</TH><TD></TD><TH scope="col">Title</TH>
  </TR>
  <TR>
    <TD valign=top>
      10
    </TD>
    <TD valign=top>
      <A  HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=10&p=1&f=G&l=50&d=PTXT&S1=*a&OS=*a&RS=*a>8,519,110</A>
    </TD>
    <TD valign=baseline>
      <IMG border=0 src="/netaicon/PTO/ftext.gif" alt="Full-Text">
    </TD>
    <TD valign=top>
      <A  HREF=/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=10&p=1&f=G&l=50&d=PTXT&S1=*a&OS=*a&RS=*a>mRNA cap analogs</A>
    </TD>

所以我希望我的文本框显示如下

/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=10&p=1&f=G&l=50&d=PTXT&S1=*a&OS=*a&RS=*a

8,519,110

mRNA cap analogs

重复上面的html标记以获得更多的表行,并希望得到所有这些行,我已经读过我们可以使用“GetAttribute”来获取html元素,但我想提取一个特定的部分,如上所述上方。

2 个答案:

答案 0 :(得分:1)

如果不理解为什么要这样做,那么给你一个很好的解决方案有点困难。

我将提供两个选项:

1)VB.NET - 目前尚不清楚如何在HTML中设置属性。你应该可以做类似的事情(注意:这是我对VB.net的记忆,并在这里手写,而不是使用VS.net):

HTML视图

<asp:HyperLink id="FirstLink" runat="server" />
...

<强>代码隐藏

FirstLink.NavigateUrl = yourUrlVariableHere
...
YourInputBox.Text = String.Concat(yourUrlVariableHere, yourOtherVariablesHere ...)

2)jQuery -

基本上,您希望获取属性然后显示它们:

$(function(){
    var anchor1 = $("#firstAnchor").attr("href");
    var imageSrc = $("#my-image").attr("src");

    $("#my-display").html(anchor1+ "<br/>" + imageSrc );
});

完整样本here

答案 1 :(得分:1)

我有一个例程,我一直用来从HTML表中提取数据 (对不起,我不相信原作者,我发现这个代码并且不知道它来自哪里)。它以表格的字符串形式解析HTML并将单元格加载到数据集中。

    Public Shared Function ConvertHtmlTablesToDataSet(html As String) As DataSet
    Dim dt As DataTable
    Dim ds As New DataSet()
    dt = New DataTable()
    Dim tableExpression As String = "<table[^>]*>(.*?)</table>"
    Dim headerExpression As String = "<th[^>]*>(.*?)</th>"
    Dim rowExpression As String = "<tr[^>]*>(.*?)</tr>"
    Dim columnExpression As String = "<td[^>]*>(.*?)</td>"
    Dim headersExist As Boolean = False
    Dim iCurrentColumn As Integer = 0
    Dim iCurrentRow As Integer = 0

    Dim tables As MatchCollection = Regex.Matches(html, tableExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)


    For Each table As Match In tables
        iCurrentRow = 0
        headersExist = False
        dt = New DataTable()

        If table.Value.Contains("<th") Then
            headersExist = True

            Dim headers As MatchCollection = Regex.Matches(table.Value, headerExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)

            For Each header As Match In headers
                dt.Columns.Add(header.Groups(1).ToString())
            Next
        Else

            Dim myvar2222 As Integer = Regex.Matches(Regex.Matches(Regex.Matches(table.Value, tableExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)(0).ToString(), rowExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)(0).ToString(), columnExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase).Count

            For iColumns As Integer = 1 To myvar2222
                dt.Columns.Add("Column " + System.Convert.ToString(iColumns))

            Next
        End If

        Dim rows As MatchCollection = Regex.Matches(table.Value, rowExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)
        Try

            For Each row As Match In rows
                If Not ((iCurrentRow = 0) And headersExist) Then
                    Dim dr As DataRow = dt.NewRow()
                    iCurrentColumn = 0

                    Dim columns As MatchCollection = Regex.Matches(row.Value, columnExpression, RegexOptions.Singleline Or RegexOptions.Multiline Or RegexOptions.IgnoreCase)

                    For Each column As Match In columns
                        dr(iCurrentColumn) = column.Groups(1).ToString()
                        iCurrentColumn += 1
                        If iCurrentColumn = dt.Columns.Count Then Exit For
                    Next

                    dt.Rows.Add(dr)
                End If
                iCurrentRow += 1
            Next

            ds.Tables.Add(dt)
        Catch ex As Exception
            Stop
        End Try
    Next

    Return ds
End Function