以重复方式打印的刮擦数据

时间:2017-06-24 17:32:08

标签: python-3.x selenium selenium-webdriver web-scraping

使用selenium在python中编写脚本,当我运行它时,我会以奇怪的重复格式获取数据。我希望从我的脚本中提到的网页解析第一个表。

以下是我迄今为止尝试解析第一个表格的脚本:

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://fantasy.premierleague.com/player-list/")

table_data = driver.find_elements_by_xpath("//table[@class='ism-table']")[0]

list_rows = []

for items in table_data.find_elements_by_xpath(".//tr"):

    list_cells = []

    for item in items.find_elements_by_xpath(".//td"):

        list_cells.append(item.text)

    list_rows.append(list_cells)

    print(list_rows)

driver.quit()

如果您点击链接,我可以看到我得到的结果: " https://www.dropbox.com/s/c4n08jt2k7amx4j/Parsed%20table%20data.txt?dl=0"

存储数据的Html元素:

<table class="ism-table">
        <colgroup>
            <col class="ismCol1">
            <col class="ismCol2">
            <col class="ismCol3">
            <col class="ismCol4">
        </colgroup>
        <thead>
            <tr>
                <th>Player</th>
                <th>Team</th>
                <th>Points</th>
                <th>Cost</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>Courtois</td>
                <td>Chelsea</td>
                <td>141</td>
                <td>£5.9</td>
            </tr>

2 个答案:

答案 0 :(得分:0)

每次内部循环后,您不会删除list_cells的值:

list_rows = []

for items in table_data.find_elements_by_xpath(".//tr"):

    list_cells = []

    for item in items.find_elements_by_xpath(".//td"):

        list_cells.append(item.text)

    list_rows.append(list_cells)
    list_cells=[]

print(list_rows)

答案 1 :(得分:0)

这是我期待的答案:

from selenium import webdriver

driver = webdriver.Chrome()

driver.get("https://fantasy.premierleague.com/player-list/")

table_data = driver.find_elements_by_xpath("//table[@class='ism-table']")[0]

list_rows = []

for items in table_data.find_elements_by_xpath(".//tr"):

    list_cells = []

    for item in items.find_elements_by_xpath(".//td"):

        list_cells.append(item.text)

    list_rows.append(list_cells)

for data in list_rows:

    print(data)

driver.quit()