链接到网站:http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal
我正在尝试编写遍历表中每一行并从该行提取每个元素的代码。 我的目标是采用以下布局
Row1Element1, Row1Element2, Row1Element3
Row2Element1, Row2Element2, Row2Element3
Row3Element1, Row3Element2, Row3Element3
我对此进行了两次主要尝试。
尝试1:
rows = driver.find_elements_by_xpath('//table//body//tr')
elements = rows.find_elements_by_xpath('//td')
#this gets all rows in the table, but then gets all elements on the page,
not just the table
尝试2:
driver.find_elements_by_xpath('//table//body//tr//td')
#this gets all the elements that I want, but makes no distinction to which
row each element belongs to
感谢您的帮助
答案 0 :(得分:1)
您可以获取表头并使用索引来获取行数据中的正确顺序。
from selenium import webdriver
driver = webdriver.Chrome()
driver.get("http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal")
table_headers = [th.text.strip() for th in driver.find_elements_by_css_selector("#matchheader th")]
rows = driver.find_elements_by_css_selector("#matches tbody > tr")
date_index = table_headers.index("Date")
tournament_index = table_headers.index("Tournament")
score_index = table_headers.index("Score")
for row in rows:
table_data = row.find_elements_by_tag_name("td")
print(table_data[date_index].text, table_data[tournament_index].text, table_data[score_index].text)
答案 1 :(得分:0)
这是定位器,每行代表您要查询的表
XPATH: //table[@id="matches"]//tbody//tr
首次导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
每行:
driver.get('http://www.tennisabstract.com/cgi-bin/player-classic.cgi?p=RafaelNadal')
rows = WebDriverWait(driver, 20).until(EC.visibility_of_all_elements_located((By.XPATH, '//table[@id="matches"]//tbody//tr')))
for row in rows:
print(row.text)
或每个单元格:
for row in rows:
cols = row.find_elements_by_tag_name('td')
for col in cols:
print(col.text)