Question

我正在尝试从多个网站中删除信息。

<div class="detailSection">
        <span>Officer/Director Detail</span>
            <span><b>Name & Address</b></span>
            <br/>
            <br/>
     <span>Title&nbsp;VD</span>
     <br/>
     <br/>
GUNN, BETTY    <span>

<div>
6922 SOUTH LAGOON DR<br/>
         PANAMA CITY BEACH, FL 32408<br/>
</div>

我能够提取除名称＆＃34; GUNN，BETTY＆＃34;之外的所有信息。

网页为http://search.sunbiz.org/Inquiry/CorporationSearch/SearchResultDetail?inquiryType=DocumentNumber&aggregateId=domnp-763425-68d63992-2677-4bd5-9e1e-3f63ef505809&directionType=Initial&searchNameOrder=AMBASSADORBEACHOWNERSASSOCIATI%207634250&searchTerm=763425

Officer_Director_Detail2 = Doc.getElementsByClassName("detailSection")(5).getElementsByTagName("span")(2).innerText copies "Title VD".

Officer_Director_Detail3 = Doc.getElementsByClassName("detailSection")(5).getElementsByTagName("span")(3).innerText copies "6922 SOUTH LAGOON DR PANAMA CITY BEACH, FL 32408".

我尝试过使用＆＃34; br＆＃34;和＆＃34; div＆＃34;但是都不会复制这个名字。 HELP !!!

Answer 1

尝试此代码并选择您感兴趣的字段（txt（i））'BETTY GUNN，at txt（5）

txt = Split(doc.getElementsByClassName("detailSection")(5).innerText, vbCrLf)
For i = 0 To UBound(txt)
MsgBox i & ":" & txt(i)
Next i

Answer 2

不幸的是，您不能使用文本节点的XPath，而只能使用XPath在硒中使用Split来获取该字符串。在安装selenium basic之后，它将使用硒类型库参考。

Option Explicit
Public Sub GetInfo()
    Dim d As WebDriver, arr() As String
    Set d = New ChromeDriver
    Const URL = "http://search.sunbiz.org/Inquiry/CorporationSearch/SearchResultDetail?inquiryType=DocumentNumber&aggregateId=domnp-763425-68d63992-2677-4bd5-9e1e-3f63ef505809&directionType=Initial&searchNameOrder=AMBASSADORBEACHOWNERSASSOCIATI%207634250&searchTerm=763425"
    With d
        .AddArgument "--headless"
        .Start "Chrome"
        .get URL
        Debug.Print Split(.FindElementByXPath("//*[@id='maincontent']/div[2]/div[6]").Text, Chr$(10))(5)
        .Quit
    End With
End Sub

使用vba抓取网页

2 个答案: