从HTML表中提取文本

时间:2015-08-19 22:48:11

标签: vba web-scraping getelementbyid getelementsbytagname getelementsbyclassname

我试图从此页面中提取各种元素:

http://partsurfer.hp.com/Search.aspx?searchText=4CE0460D0G

我想从ctl00_BodyContentPlaceHolder_lblSerialNumber开始。

当你知道ID时,肯定有一个简单的解决方案可以从HTML页面中提取你想要的元素吗?我认为像getElementsByNamegetElementById甚至getElementsByTagName之类的东西都会起作用,但是我不能让它提取我想要的东西,尽我所能!

这不起作用:

 Function GetHPModelName()

     Dim ie As Object
        Dim Oelement As Object
        Dim Ohtml As New MSHTML.HTMLDocument
        Dim lrow As Integer

        With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", "http://partsurfer.hp.com/Search.aspx?searchText=" & Worksheets("HP_Lookup").Range("A2").Value, False
        .send
        Ohtml.body.innerHTML = .responseText

        End With


    FetchHPInfo "ctl00_BodyContentPlaceHolder_lblSerialNumber", "A", Oelement, Ohtml 
End Function

调用

Public Function FetchHPInfo(tablename As String, thiscolumn As String, Oelement As Object, Ohtml As MSHTML.HTMLDocument)
lrow = 1
For Each Oelement In Ohtml.getElementsById(tablename)
    Worksheets("HP_main").Range(thiscolumn & lrow).Value = Oelement.innerText
    lrow = lrow + 1
    Next Oelement
    Worksheets("HP_main").Columns(thiscolumn).cells.HorizontalAlignment = xlHAlignLeft
    Worksheets("HP_main").Columns(thiscolumn).AutoFit
End Function

1 个答案:

答案 0 :(得分:1)

getElementById()应该是您所需要的,因为该节点具有ID​​属性。您可能遇到问题,因为您尝试将responseText分配给文档正文,但该文档还没有<body>节点。只需使用write()将整个响应写入空文档即可。这是一个我扔在一起的例子,它返回了正确的值:

Dim objHttp
Set objHttp = CreateObject("MSXML2.XMLHTTP")
objHttp.Open "GET", "http://partsurfer.hp.com/Search.aspx?searchText=4CE0460D0G", False
objHttp.Send

Dim doc
Set doc = CreateObject("htmlfile")
doc.write objHttp.responseText

MsgBox doc.getElementById("ctl00_BodyContentPlaceHolder_lblSerialNumber").innerText

输出:

4CE0460D0G