Python Selenium Scrape隐藏数据

时间:2015-10-30 02:47:56

标签: python selenium web-scraping

我正试图抓住以下页面(仅针对此问题的目的第1页):

https://www.sportstats.ca/display-results.xhtml?raceid=4886

我可以使用Selinium来获取源然后解析它,但不是我正在寻找的所有数据都在源中。其中一些需要通过点击元素找到。

例如,对于第一个人,我可以从源获取所有可见字段。但是如果你点击+,我想要抓取更多数据。例如,“点击时间”(01:15:29.9),以及点击一个人的+后右侧弹出的城市(奥克维尔)。

我不知道如何识别需要点击以扩展+的元素,然后即使点击它,我也不知道如何找到我正在寻找的值。

任何提示都会很棒。

1 个答案:

答案 0 :(得分:1)

以下是您的要求的示例代码。这段代码基于python,selenium和crome exe文件。

            from selenium import webdriver
            from lxml.html import tostring,fromstring
            import time
            import csv

            myfile = open('demo_detail.csv', 'wb')
            wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
            driver=webdriver.Chrome('./chromedriver.exe')
            csv_heading=["","","BIB","NAME","CATEGORY","RANK","GENDER PLACE","CAT. PLACE","GUN TIME","SPLIT NAME","SPLIT DISTANCE","SPLIT TIME","PACE","DISTANCE","RACE TIME","OVERALL (/814)","GENDER (/431)","CATEGORY (/38)","TIME OF DAY"]
            wr.writerow(csv_heading)
            count=0
            try:
                url="https://www.sportstats.ca/display-results.xhtml?raceid=4886"
                driver.get(url)
                table_tr=driver.find_elements_by_xpath("//table[@class='results overview-result']/tbody/tr[@role='row']")
                for tr in table_tr:
                    lst=[]
                    count=count+1
                    table_td=tr.find_elements_by_tag_name("td")
                    for td in table_td:
                        lst.append(td.text)

                    table_td[1].find_element_by_tag_name("div").click()
                    time.sleep(5)
                    table=driver.find_elements_by_xpath("//div[@class='ui-datatable ui-widget']")
                    for demo_tr in driver.find_elements_by_xpath("//tr[@class='ui-expanded-row-content ui-widget-content view-details']/td/div/div/table/tbody/tr"):
                        for demo_td in demo_tr.find_elements_by_tag_name("td"):
                            lst.append(demo_td.text)
                    wr.writerow(lst)
                    table_td[1].find_element_by_tag_name("div").click()
                    time.sleep(5)
                    print count
                time.sleep(5)
                driver.quit()
            except Exception as e:
                print e
                driver.quit()