I'm trying to take some addresses from a page. What I want is to extract only the title, this title is the address I want.
Here is the HTML
<li class="brd-bottom-1">
<a href="#" title="AVDA. A. RENDIC ESQ. AVDA. P. A. CERDA" class="position-relative vue-accordion-link-stations">
<i class="fas fa-map-marker-alt position-absolute s-display-none"></i>
<i class="fas fa-caret-down position-absolute s-display-none"></i>
<div class="font-size-16 marg-bottom-5">Avda. a. rendic esq. avda. p. a. cerda
<i class="fas fa-caret-down marg-left-5 m-display-none l-display-none"></i></div>
Here is my python code
driver = webdriver.Chrome(r"C:\Users\heju8004\Documents\Archivos de Python\chromedriver.exe")
driver.get("https://ww2.copec.cl/stations?check=punto")
for i in range(15):
driver.get("https://ww2.copec.cl/stations?check=punto")
driver.find_element_by_xpath("//*[@id='root']/div[1]/div/ul/li[1]/a").click()
string = "//*[@id='root']/div[1]/div/ul/li[1]/ul/li/span[{}]".format(i+1)
driver.find_element_by_xpath(string).click()
time.sleep(2)
resultSet = driver.find_element_by_xpath("//*[@id='root']/div[2]/div[2]/div[2]/ul")
options = resultSet.find_elements_by_tag_name("li")
for option in options:
otraOption = option.find_element_by_xpath("//a")
print(otraOption.title)
答案 0 :(得分:1)
最快和最简单的解决方案是使用$x = 80;
$y = 50;
$width = 51.8;
$height = 10.8;
$pdf->Rect($x, $y, $width, $height);
$pdf->cropMark($x, $y, 10, 10, 'TL');
$pdf->cropMark($x + $width, $y, 10, 10, 'TR');
$pdf->cropMark($x, $y + $height, 10, 10, 'BL');
$pdf->cropMark($x + $width, $y + $height, 10, 10, 'BR');
//Country_Image
$country_width = 5;
$country_height = 10.8;
//Images
$pdf->Image('../../images/country/desuthchland-icon.png', $x, $y, $country_width, $country_height, '', '', '', false, 300);
$left_text_x = 85;
$left_text_y = 50;
$middle_img_top_x = $left_text_x + 15.8;
$middle_img_top_y = $y + 0.6;
$middle_img_bottom_x = $left_text_x + 15.3;
$middle_img_bottom_y = $y + 5.1;
$middle_text_x = $middle_img_bottom_x + 4.7;
$middle_text_y = $left_text_y;
$right_text_x = $middle_text_x + 11.3;
$right_text_y = $left_text_y;
//Text
$pdf->Text($left_text_x, $left_text_y, 'AAA', false, false, true, 0, 0, '', false, '', 0, false, 'T', 'M', false ); //Left
$pdf->Image('../../images/bundesland/Baden-Wurttemberg.png', $middle_img_top_x, $middle_img_top_y, 4, 4, '', '', '', false, 300); //Top img mid
$pdf->Image('../../images/plakette/rose.png', $middle_img_bottom_x, $middle_img_bottom_y, 5, 5, '', '', '', false, 300); //Bottom img mid
$pdf->Text($middle_text_x, $middle_text_y, 'B', false, false, true, 0, 0, '', false, '', 0, false, 'T', 'M', false ); //Middle
$pdf->Text($right_text_x, $right_text_y, 'CCC', false, false, true, 0, 0, '', false, '', 0, false, 'T', 'M', false ); //Right
软件包,而不使用Selenium。您可以在一个请求中获得所有地区的所有电台:
requests
您可以为每个电台获得的字段:
import requests
url = 'https://ww2.copec.cl/stations/get_stations.json?pagoclick_filter=true&geohash=66jc8&limit=2000'
response = requests.get(url)
stations = response.json()["stations"]
for station in stations:
print("region: %s, title: %s" % (station["region"], station["title"]))
答案 1 :(得分:0)
按标签查找:
driver = webdriver.Chrome(r"C:\Users\heju8004\Documents\Archivos de Python\chromedriver.exe"
# ...
for i in range(15):
# ... etc
current_title = driver.find_element_by_tag_name('a').get_attribute('title')
# ...
如评论中所述,更好的方法是先提取所有<a>
个元素,然后提取所需的内容:
driver = webdriver.Chrome(r"C:\Users\heju8004\Documents\Archivos de Python\chromedriver.exe"
# ... navigate to page
a_elements = driver.driver.find_elements_by_tag_name('a') # notice plural 'elements'
titles = []
for element in a_elements:
try:
titles.append(element.get_attribute('title')
except Exception as e:
print(f'No element found for {element} with error: {e}')
答案 2 :(得分:0)
所以我要做的就是对您的代码进行简单的修改。 只需循环遍历您要遍历的“ li”并获取其中的所有标签。请注意,我使用“ find_elements_by_tag_name”,请注意将“ s”设置为复数形式,以获取所有标签。
driver = webdriver.Chrome(r"C:\Users\heju8004\Documents\Archivos de Python\chromedriver.exe")
driver.get("https://ww2.copec.cl/stations?check=punto")
for i in range(15):
driver.get("https://ww2.copec.cl/stations?check=punto")
driver.find_element_by_xpath("//*[@id='root']/div[1]/div/ul/li[1]/a").click()
string = "//*[@id='root']/div[1]/div/ul/li[1]/ul/li/span[{}]".format(i+1)
driver.find_element_by_xpath(string).click()
time.sleep(2)
resultSet = driver.find_element_by_xpath("//*[@id='root']/div[2]/div[2]/div[2]/ul")
options = resultSet.find_elements_by_tag_name("li")
for option in options:
otraOption = option.find_elements_by_tag_name("a")
titles = otraOption.get_attribute('title')
print(titles)
拥有所有标签后,便可以从标签中提取标题。