Question

我想刮擦这张桌子，并获取其所有详细信息。 html代码是这样的：

<table id="bnConnectionTemplate:r1:0:tl1" class="detailTable" cellpadding="0" cellspacing="0" border="0" summary="">
<tbody>
    <tr>
        <th>Name: </th>
        <td>EVERBRITE CORPORATION LIMITED</td>
    </tr>
    <tr>
        <th><abbr title="Australian Company Number">ACN: </abbr></th>
        <td>104 436 704</td>
    </tr>
    <tr>
        <th><abbr title="Australian Business Number">ABN: </abbr></th>
        <td><a id="bnConnectionTemplate:r1:0:j_id__ctru57pc2" class="contentLink af_goLink" href="http://abr.business.gov.au/Search.aspx?SearchText=96%20104%20436%20704" target="_blank"><span title="">96 104 436 704</span><span class="hiddenHint"> (External Link)</span></a></td>
    </tr>
    <tr>
        <th>Registration date: </th>
        <td>15/04/2003</td>
    </tr>
    <tr>
        <th>Next review date: </th>
        <td>15/04/2013</td>
    </tr>
    <tr>
        <th>Former name(s): </th>
        <td>VISIONGLOW GLOBAL LIMITED</td>
    </tr>
    <tr>
        <th></th>
        <td></td>
    </tr>
    <tr>
        <th>Status: </th>
        <td>Deregistered</td>
    </tr>
    <tr>
        <th>Date deregistered: </th>
        <td>7/09/2012</td>
    </tr>
    <tr>
        <th>Type: </th>
        <td>Australian Public Company, Limited By Shares</td>
    </tr>
    <tr>
        <th>Locality of registered office: </th>
        <td></td>
    </tr>
    <tr>
        <th>Regulator: </th>
        <td>Australian Securities &amp; Investments Commission</td>
    </tr>
</tbody>

我的问题是，即使我尝试通过其类或ID获取该表，也无法获得该表。

# noinspection PyUnresolvedReferences
import requests
# noinspection PyUnresolvedReferences

from bs4 import BeautifulSoup

source = requests.get("https://connectonline.asic.gov.au/RegistrySearch/faces/landing/panelSearch.jspx?searchText=104+436+704&searchType=OrgAndBusNm&_adf.ctrl-state=139sjjyk9g_15").text
soup = BeautifulSoup(source, 'lxml')

我尝试做：

table = soup.find('table', class_= 'detailTable') # Gives output : none

table = soup.find('table', id="bnConnectionTemplate:r1:0:tl1") # Gives output : none

在这一点上，我对为什么会发生这种情况感到困惑。过去，我已经使用这些命令进行了爬网，并且它们运行良好，希望能提供任何帮助。

试图网页抓取一张桌子但不能

0 个答案: