使用vba抓取网页

时间:2016-10-15 23:02:47

标签: excel vba web-scraping screen-scraping

我正在尝试从多个网站中删除信息。

<div class="detailSection">
        <span>Officer/Director Detail</span>
            <span><b>Name & Address</b></span>
            <br/>
            <br/>
     <span>Title&nbsp;VD</span>
     <br/>
     <br/>
GUNN, BETTY    <span>

<div>
6922 SOUTH LAGOON DR<br/>
         PANAMA CITY BEACH, FL 32408<br/>
</div>

我能够提取除名称&#34; GUNN,BETTY&#34;之外的所有信息。

网页为http://search.sunbiz.org/Inquiry/CorporationSearch/SearchResultDetail?inquiryType=DocumentNumber&aggregateId=domnp-763425-68d63992-2677-4bd5-9e1e-3f63ef505809&directionType=Initial&searchNameOrder=AMBASSADORBEACHOWNERSASSOCIATI%207634250&searchTerm=763425

Officer_Director_Detail2 = Doc.getElementsByClassName("detailSection")(5).getElementsByTagName("span")(2).innerText copies "Title VD".

Officer_Director_Detail3 = Doc.getElementsByClassName("detailSection")(5).getElementsByTagName("span")(3).innerText copies "6922 SOUTH LAGOON DR PANAMA CITY BEACH, FL 32408".

我尝试过使用&#34; br&#34;和&#34; div&#34;但是都不会复制这个名字。 HELP !!!

2 个答案:

答案 0 :(得分:1)

尝试此代码并选择您感兴趣的字段(txt(i))'BETTY GUNN,at txt(5)

txt = Split(doc.getElementsByClassName("detailSection")(5).innerText, vbCrLf)
For i = 0 To UBound(txt)
MsgBox i & ":" & txt(i)
Next i

答案 1 :(得分:0)

不幸的是,您不能使用文本节点的XPath,而只能使用XPath在硒中使用Split来获取该字符串。在安装selenium basic之后,它将使用硒类型库参考。

Option Explicit
Public Sub GetInfo()
    Dim d As WebDriver, arr() As String
    Set d = New ChromeDriver
    Const URL = "http://search.sunbiz.org/Inquiry/CorporationSearch/SearchResultDetail?inquiryType=DocumentNumber&aggregateId=domnp-763425-68d63992-2677-4bd5-9e1e-3f63ef505809&directionType=Initial&searchNameOrder=AMBASSADORBEACHOWNERSASSOCIATI%207634250&searchTerm=763425"
    With d
        .AddArgument "--headless"
        .Start "Chrome"
        .get URL
        Debug.Print Split(.FindElementByXPath("//*[@id='maincontent']/div[2]/div[6]").Text, Chr$(10))(5)
        .Quit
    End With
End Sub