从<a> from an interation Selenium Python

时间:2019-02-26 20:00:20

标签: python selenium selenium-webdriver selenium-chromedriver

I'm trying to take some addresses from a page. What I want is to extract only the title, this title is the address I want.

Here is the HTML

<li class="brd-bottom-1">
   <a href="#" title="AVDA. A. RENDIC ESQ. AVDA. P. A. CERDA" class="position-relative vue-accordion-link-stations">
     <i class="fas fa-map-marker-alt position-absolute s-display-none"></i> 
     <i class="fas fa-caret-down position-absolute s-display-none"></i> 
     <div class="font-size-16 marg-bottom-5">Avda. a. rendic esq. avda. p. a. cerda 
        <i class="fas fa-caret-down marg-left-5 m-display-none l-display-none"></i></div>  

Here is my python code

    driver = webdriver.Chrome(r"C:\Users\heju8004\Documents\Archivos de Python\chromedriver.exe")
    driver.get("https://ww2.copec.cl/stations?check=punto")
    for i in range(15):
        driver.get("https://ww2.copec.cl/stations?check=punto")
        driver.find_element_by_xpath("//*[@id='root']/div[1]/div/ul/li[1]/a").click()
        string = "//*[@id='root']/div[1]/div/ul/li[1]/ul/li/span[{}]".format(i+1)     
        driver.find_element_by_xpath(string).click()
        time.sleep(2)
        resultSet = driver.find_element_by_xpath("//*[@id='root']/div[2]/div[2]/div[2]/ul")
        options = resultSet.find_elements_by_tag_name("li")
        for option in options:
            otraOption = option.find_element_by_xpath("//a")
            print(otraOption.title)

3 个答案:

答案 0 :(得分:1)

最快和最简单的解决方案是使用$x = 80; $y = 50; $width = 51.8; $height = 10.8; $pdf->Rect($x, $y, $width, $height); $pdf->cropMark($x, $y, 10, 10, 'TL'); $pdf->cropMark($x + $width, $y, 10, 10, 'TR'); $pdf->cropMark($x, $y + $height, 10, 10, 'BL'); $pdf->cropMark($x + $width, $y + $height, 10, 10, 'BR'); //Country_Image $country_width = 5; $country_height = 10.8; //Images $pdf->Image('../../images/country/desuthchland-icon.png', $x, $y, $country_width, $country_height, '', '', '', false, 300); $left_text_x = 85; $left_text_y = 50; $middle_img_top_x = $left_text_x + 15.8; $middle_img_top_y = $y + 0.6; $middle_img_bottom_x = $left_text_x + 15.3; $middle_img_bottom_y = $y + 5.1; $middle_text_x = $middle_img_bottom_x + 4.7; $middle_text_y = $left_text_y; $right_text_x = $middle_text_x + 11.3; $right_text_y = $left_text_y; //Text $pdf->Text($left_text_x, $left_text_y, 'AAA', false, false, true, 0, 0, '', false, '', 0, false, 'T', 'M', false ); //Left $pdf->Image('../../images/bundesland/Baden-Wurttemberg.png', $middle_img_top_x, $middle_img_top_y, 4, 4, '', '', '', false, 300); //Top img mid $pdf->Image('../../images/plakette/rose.png', $middle_img_bottom_x, $middle_img_bottom_y, 5, 5, '', '', '', false, 300); //Bottom img mid $pdf->Text($middle_text_x, $middle_text_y, 'B', false, false, true, 0, 0, '', false, '', 0, false, 'T', 'M', false ); //Middle $pdf->Text($right_text_x, $right_text_y, 'CCC', false, false, true, 0, 0, '', false, '', 0, false, 'T', 'M', false ); //Right 软件包,而不使用Selenium。您可以在一个请求中获得所有地区的所有电台:

requests

您可以为每个电台获得的字段:

import requests

url = 'https://ww2.copec.cl/stations/get_stations.json?pagoclick_filter=true&geohash=66jc8&limit=2000'
response = requests.get(url)

stations = response.json()["stations"]
for station in stations:
    print("region: %s, title: %s" % (station["region"], station["title"]))

答案 1 :(得分:0)

按标签查找:

driver = webdriver.Chrome(r"C:\Users\heju8004\Documents\Archivos de Python\chromedriver.exe"
# ...
for i in range(15):
    # ... etc
    current_title = driver.find_element_by_tag_name('a').get_attribute('title')
    # ...

编辑:

如评论中所述,更好的方法是先提取所有<a>个元素,然后提取所需的内容:

driver = webdriver.Chrome(r"C:\Users\heju8004\Documents\Archivos de Python\chromedriver.exe"
# ... navigate to page
a_elements = driver.driver.find_elements_by_tag_name('a') # notice plural 'elements'
titles = []
for element in a_elements:
    try:
        titles.append(element.get_attribute('title')
    except Exception as e:
        print(f'No element found for {element} with error: {e}')

答案 2 :(得分:0)

所以我要做的就是对您的代码进行简单的修改。 只需循环遍历您要遍历的“ li”并获取其中的所有标签。请注意,我使用“ find_elements_by_tag_name”,请注意将“ s”设置为复数形式,以获取所有标签。

  driver = webdriver.Chrome(r"C:\Users\heju8004\Documents\Archivos de Python\chromedriver.exe")
        driver.get("https://ww2.copec.cl/stations?check=punto")
        for i in range(15):
            driver.get("https://ww2.copec.cl/stations?check=punto")
            driver.find_element_by_xpath("//*[@id='root']/div[1]/div/ul/li[1]/a").click()
            string = "//*[@id='root']/div[1]/div/ul/li[1]/ul/li/span[{}]".format(i+1)     
            driver.find_element_by_xpath(string).click()
            time.sleep(2)
            resultSet = driver.find_element_by_xpath("//*[@id='root']/div[2]/div[2]/div[2]/ul")
            options = resultSet.find_elements_by_tag_name("li")
            for option in options:
                otraOption = option.find_elements_by_tag_name("a")
                titles = otraOption.get_attribute('title')
                print(titles)

拥有所有标签后,便可以从标签中提取标题。