如何在嵌套div中将所有文本分隔为1个元素ID

时间:2016-10-21 03:12:45

标签: html python-2.7 selenium

父链接下的

监控选项卡: http://www.pajak.go.id/statistik-amnesti

使用Python,我试图在https://monitoringamnesti.pajak.go.id/viewer/dashboard?dashboardguid=90a16bf8-d418-4ed4-8160-7f883f601dd0&v=636126392121123334&style=Default

上提取左上角的表格

我的代码:

import selenium.webdriver as driver

browser = driver.Chrome()  

url=  "https://monitoringamnesti.pajak.go.id/viewer/public/dashboard?name=Monitoring_Amnesti_Pajak"

browser.get(url)

all_text = browser.execute_script("returndocument.getElementById('SimpleDataGrid-viewport').textContent")

然而,所有文字都被混为一谈。有没有办法可以将表中的所有信息作为列表/数据框获取?

HTML代码:

<div id="SimpleDataGrid-viewport" class="datagrid-viewport" style="width: 120px; height: 376px;">
<div id="SimpleDataGrid-spacer-clip" class="datagrid-spacer-clip clip _hidden" style="width: 22px; height: 23px;">
    <div id="SimpleDataGrid-spacer" class="datagrid-spacer" style="width: 22px; height: 23px;">
        <div class="row">
            <div class="cell blank" style="border-bottom-color: rgb(0, 153, 195); width: 22px; height: 11px;">&nbsp;
            </div>
        </div>
    </div>
</div>
<div id="SimpleDataGrid-head-clip" class="datagrid-head-clip clip" style="width: 120px; margin-left: 0px; height: 23px;">
<div id="SimpleDataGrid-head" class="datagrid-head" style="top: 0px; left: 0px; width: 552px;">
    <div class="row">
        <div class="cell column0 text sortable" data-type="text" data-index="0" data-sortorder="unsorted" style="border-bottom-color: rgb(0, 153, 195); width: 161px; height: 11px;">Jenis<em class="unsorted" data-sortorder="unsorted"></em>
        </div>
        <div class="cell column1 number sortable" data-type="number" data-index="1" data-sortorder="unsorted" style="border-bottom-color: rgb(0, 153, 195); width: 52px; height: 11px;">Juli<em class="unsorted" data-sortorder="unsorted"></em>
        </div>
        <div class="cell column2 number sortable" data-type="number" data-index="2" data-sortorder="unsorted" style="border-bottom-color: rgb(0, 153, 195); width: 58px; height: 11px;">Agustus<em class="unsorted" data-sortorder="unsorted"></em>
        </div>
        <div class="cell column3 number sortable" data-type="number" data-index="3" data-sortorder="unsorted" style="border-bottom-color: rgb(0, 153, 195); width: 73px; height: 11px;">September<em class="unsorted" data-sortorder="unsorted"></em></div>
        <div class="cell column4 number sortable" data-type="number" data-index="4" data-sortorder="unsorted" style="border-bottom-color: rgb(0, 153, 195); width: 58px; height: 11px;">Oktober<em class="unsorted" data-sortorder="unsorted"></em>
        </div>

1 个答案:

答案 0 :(得分:0)

正如Sudharsan所说,您提供的URL请求登录凭据,因此我们无法看到您引用的内容。

如果你想使用Selenium这样做,那么你需要指示Selenium拉出感兴趣的表,然后将其输出到你感兴趣的任何形式,而不是简单地让selenium执行Javascript。 / p>

基于你提供的代码,我猜你可以通过查看这个来找到满足你需求的东西,其中css选择器在网格的每一行中找到所有“列”: / p>

# print the content of each row in the table (this will include the headers)
for cell in browser.find_elements_by_css_selector("div[id='SimpleDataGrid-head'] div.row div.cell"):
    print(cell.text)