提取html标记中的文本并将其作为表标题

时间:2015-04-30 18:21:43

标签: html excel excel-vba vba

现在表格标题设置为

tblNameArr = Array("Balance Sheet", "Cash Flow", "Header 3", "Header 4")

如何根据html标签中的文字更改四个表标题的名称?

e.g。我希望第一个表名改为"重要财务指标"名称是" a1"并且id也是" a1",并将下面的表名更改为"资产负债表","现金流量表"和"综合损益表"相应?

请参阅下面的html代码。

Sub GetFinanceData()
For x = 1 To 10
Dim URL As String, elemCollection As Object
Dim t As Integer, r As Integer, c As Integer

Worksheets("Stocks").Select
Worksheets("Stocks").Activate

'Open IE and Go to the Website

URL = "http://stock.finance.sina.com.cn/hkstock/finance/00001.html"
URL = Cells(x, 1)

Set IE = CreateObject("InternetExplorer.Application")
With IE
    .navigate URL
    .Visible = True

    Do While .Busy = True Or .readyState <> 4
        Loop
    DoEvents

Worksheets.Add(After:=Worksheets(Worksheets.Count)).Name = _
ThisWorkbook.Worksheets("Stocks").Range("B" & x).Value     'You could even simplify it and just state the name as Cells(x,2)


'Select the Report Type

Set selectItems = IE.Document.getElementsByTagName("select")
    For Each i In selectItems
        i.Value = "zero"
        i.FireEvent ("onchange")
        Application.Wait (Now + TimeValue("0:00:05"))
    Next i

    Do While .Busy: DoEvents: Loop

ActiveSheet.Range("A1:K500").ClearContents

ActiveSheet.Range("A1").Value = .Document.getElementsByTagName("h1")(0).innerText
ActiveSheet.Range("B1").Value = .Document.getElementsByTagName("em")(0).innerText

'Find and Get Table Data

tblNameArr = Array("Balance Sheet", "Cash Flow", "Header 3", "Header 4")
tblStartRow = 5
Set elemCollection = .Document.getElementsByTagName("TABLE")
For t = 0 To elemCollection.Length - 1
    For r = 0 To (elemCollection(t).Rows.Length - 1)
        For c = 0 To (elemCollection(t).Rows(r).Cells.Length - 1)
        ActiveSheet.Cells(r + tblStartRow, c + 1) = elemCollection(t).Rows(r).Cells(c).innerText
    Next c
Next r

ActiveSheet.Cells(r + tblStartRow + 2, 1) = tblNameArr(t)
tblStartRow = tblStartRow + r + 4

Next t

End With

' cleaning up memory

IE.Quit

Next x


End Sub

以下是html代码:

<!--重要财务指标 start-->
<a name="a1" id="a1"></a>
<div class="part02">
  <div class="sub01">
      <div class="sub01_tt fblue">
          <span class=" selected"><a href="#a1" target="_self">重要财务指标</a></span>
          <span class=""><a href="#a2" target="_self">资产负债表</a></span>
          <span class=""><a href="#a3" target="_self">现金流量表</a></span>
          <span class=""><a href="#a4" target="_self">综合损益表</a></span>
          <em class="rt">报表类型:<select class="fgrey" style="width:100px;" interface="getFinanceStandardForjs?symbol=$symbol&financeStanderd=" table="tableGetFinanceStandard" onchange="selectData(this);">
            <option value="all" >全部</option>
            <option value="zero" >年报</option>
            <option value="1" >中报</option>
            <option value="2" >一季报</option>
            <option value="3" >三季报</option>
        </select></em>
    </div>

1 个答案:

答案 0 :(得分:0)

标记<a name="a1" id="a1"></a>创建的元素为空。它只是一个链接锚点。它不包含任何东西。所以获得这个元素是没用的。

可以采用一种方法,遍历所有A个元素并选择那些引用href="#a1"href="#a2",...

的方法

示例:

...

nameBalanceSheet = "Balance Sheet"
nameCashFlow = "Cash Flow"
nameHeader3 = "Header 3"
nameHeader4 = "Header 4"

Set elemCollection = .Document.getElementsByTagName("A")
For i = 0 To elemCollection.Length - 1
 If Right(elemCollection(i).href, 3) = "#a1" Then
  nameBalanceSheet = elemCollection(i).innerText
 ElseIf Right(elemCollection(i).href, 3) = "#a2" Then
  nameCashFlow = elemCollection(i).innerText
 ElseIf Right(elemCollection(i).href, 3) = "#a3" Then
  nameHeader3 = elemCollection(i).innerText
 ElseIf Right(elemCollection(i).href, 3) = "#a4" Then
  nameHeader4 = elemCollection(i).innerText
 End If
Next

tblNameArr = Array(nameBalanceSheet, nameCashFlow, nameHeader3, nameHeader4)

...