Python:在硒中刮擦时无法找到元素

时间:2019-07-23 12:30:30

标签: python selenium xpath css-selectors webdriverwait

我正在尝试从页面上抓取电话号码。其中一个页面是this。所有页面都包含一个带有文本SEE PHONE NUMBER的链接按钮,单击该链接将显示电话号码。我正在尝试抓取该特定的电话号码。到目前为止,这是我尝试过的:

company_url = 'https://www.europages.co.uk/PORT-INTERNATIONAL-GMBH/00000004710372-508993001.html'
d = {}
try :
    options = webdriver.FirefoxOptions()
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--incognito')
    options.add_argument('--headless')
    driver = webdriver.Firefox(options=options)
    driver.get(company_url)
    link = driver.find_element_by_link_text('See phone number')
    link.click()
    driver.close()
    page = driver.page_source
    soup = bs(page, 'html.parser')
    tel_no = soup.find('div', {'class' : 'info-tel-num'})
    tel_no = tel_no.text
    d['telephone'] = tel_no
except Exception as e:
    print(f'Error encountered : {e}')

但是每次,它都会在异常块中打印此错误:

  

遇到错误:消息:无法找到元素:请参阅电话号码

此链接按钮没有任何特定的ID或类,因此我不能使用find_element_by_idfind_element_by_class。这是我通过该按钮上的inspect元素发现的(单击之前):

inspect element result

这是单击按钮后的检查元素结果:

after clicking 如何抓取这个电话号码?我在做什么错了?

4 个答案:

答案 0 :(得分:3)

所需元素是启用了JavaScript的元素,因此必须在元素上定位click()并为element_to_be_clickable()引入 WebDriverWait ,然后可以使用以下任一解决方案:

  • 使用CSS_SELECTOR

    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a[onclick^='EpGetInfoTel']"))).click()
    
  • 使用XPATH

    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[starts-with(@onclick, 'EpGetInfoTel') and text()='See phone number']"))).click()
    
  • 注意:您必须添加以下导入:

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC
    
  • 要抓取电话号码,可以使用以下代码行:

    print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//a[starts-with(@onclick, 'EpGetInfoTel') and text()='See phone number']//following::div[1]"))).get_attribute("innerHTML"))
    
  • 控制台输出:

    +49 04 03 01 00 00
    
  • 浏览器快照:

phone

答案 1 :(得分:0)

要单击该链接,您需要将其带入视口或执行javascript命令。这是你可以去的方式。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

link = "https://www.europages.co.uk/PORT-INTERNATIONAL-GMBH/00000004710372-508993001.html"

with webdriver.Chrome() as driver:
    driver.get(link)
    elem = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"[itemprop='telephone'] > a")))
    driver.execute_script("arguments[0].click();",elem)
    phone = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"div.info-tel-num"))).text
    print(phone)

答案 2 :(得分:0)

使用它来单击查看电话号码

$("[itemprop='telephone'] a")[0].click();

并使用以下方法获取电话号码值:

$("[itemprop='telephone'] [style='display: block;']")[0].innerText

答案 3 :(得分:0)

使用WebDriverWait并单击带有以下xpath的元素。然后,如果要在执行过程中使用BeautifulSoup,则获取page_source

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
company_url = 'https://www.europages.co.uk/PORT-INTERNATIONAL-GMBH/00000004710372-508993001.html'
d = {}
try :
    options = webdriver.FirefoxOptions()
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--incognito')
    options.add_argument('--headless')
    driver = webdriver.Firefox(options=options)
    driver.get(company_url)
    link =WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//a[contains(.,"See phone number")]')))
    link.click()
    time.sleep(2)
    page = driver.page_source
    driver.close()
    soup = bs(page, 'html.parser')
    tel_no = soup.find('div', {'class' : 'info-tel-num'})
    tel_no = tel_no.text
    d['telephone'] = tel_no
except Exception as e:
   print(f'Error encountered : {e}')


print(d)

控制台上的输出:

{'telephone': '+49 04 03 01 00 00'}