我正试图抓住以下页面(仅针对此问题的目的第1页):
https://www.sportstats.ca/display-results.xhtml?raceid=4886
我可以使用Selinium来获取源然后解析它,但不是我正在寻找的所有数据都在源中。其中一些需要通过点击元素找到。
例如,对于第一个人,我可以从源获取所有可见字段。但是如果你点击+,我想要抓取更多数据。例如,“点击时间”(01:15:29.9),以及点击一个人的+后右侧弹出的城市(奥克维尔)。
我不知道如何识别需要点击以扩展+的元素,然后即使点击它,我也不知道如何找到我正在寻找的值。
任何提示都会很棒。
答案 0 :(得分:1)
以下是您的要求的示例代码。这段代码基于python,selenium和crome exe文件。
from selenium import webdriver
from lxml.html import tostring,fromstring
import time
import csv
myfile = open('demo_detail.csv', 'wb')
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
driver=webdriver.Chrome('./chromedriver.exe')
csv_heading=["","","BIB","NAME","CATEGORY","RANK","GENDER PLACE","CAT. PLACE","GUN TIME","SPLIT NAME","SPLIT DISTANCE","SPLIT TIME","PACE","DISTANCE","RACE TIME","OVERALL (/814)","GENDER (/431)","CATEGORY (/38)","TIME OF DAY"]
wr.writerow(csv_heading)
count=0
try:
url="https://www.sportstats.ca/display-results.xhtml?raceid=4886"
driver.get(url)
table_tr=driver.find_elements_by_xpath("//table[@class='results overview-result']/tbody/tr[@role='row']")
for tr in table_tr:
lst=[]
count=count+1
table_td=tr.find_elements_by_tag_name("td")
for td in table_td:
lst.append(td.text)
table_td[1].find_element_by_tag_name("div").click()
time.sleep(5)
table=driver.find_elements_by_xpath("//div[@class='ui-datatable ui-widget']")
for demo_tr in driver.find_elements_by_xpath("//tr[@class='ui-expanded-row-content ui-widget-content view-details']/td/div/div/table/tbody/tr"):
for demo_td in demo_tr.find_elements_by_tag_name("td"):
lst.append(demo_td.text)
wr.writerow(lst)
table_td[1].find_element_by_tag_name("div").click()
time.sleep(5)
print count
time.sleep(5)
driver.quit()
except Exception as e:
print e
driver.quit()