使用selenium在python中编写脚本,当我运行它时,我会以奇怪的重复格式获取数据。我希望从我的脚本中提到的网页解析第一个表。
以下是我迄今为止尝试解析第一个表格的脚本:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://fantasy.premierleague.com/player-list/")
table_data = driver.find_elements_by_xpath("//table[@class='ism-table']")[0]
list_rows = []
for items in table_data.find_elements_by_xpath(".//tr"):
list_cells = []
for item in items.find_elements_by_xpath(".//td"):
list_cells.append(item.text)
list_rows.append(list_cells)
print(list_rows)
driver.quit()
如果您点击链接,我可以看到我得到的结果: " https://www.dropbox.com/s/c4n08jt2k7amx4j/Parsed%20table%20data.txt?dl=0"
存储数据的Html元素:
<table class="ism-table">
<colgroup>
<col class="ismCol1">
<col class="ismCol2">
<col class="ismCol3">
<col class="ismCol4">
</colgroup>
<thead>
<tr>
<th>Player</th>
<th>Team</th>
<th>Points</th>
<th>Cost</th>
</tr>
</thead>
<tbody>
<tr>
<td>Courtois</td>
<td>Chelsea</td>
<td>141</td>
<td>£5.9</td>
</tr>
答案 0 :(得分:0)
每次内部循环后,您不会删除list_cells
的值:
list_rows = []
for items in table_data.find_elements_by_xpath(".//tr"):
list_cells = []
for item in items.find_elements_by_xpath(".//td"):
list_cells.append(item.text)
list_rows.append(list_cells)
list_cells=[]
print(list_rows)
答案 1 :(得分:0)
这是我期待的答案:
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("https://fantasy.premierleague.com/player-list/")
table_data = driver.find_elements_by_xpath("//table[@class='ism-table']")[0]
list_rows = []
for items in table_data.find_elements_by_xpath(".//tr"):
list_cells = []
for item in items.find_elements_by_xpath(".//td"):
list_cells.append(item.text)
list_rows.append(list_cells)
for data in list_rows:
print(data)
driver.quit()