Question

我正在尝试从页面上抓取电话号码。其中一个页面是this。所有页面都包含一个带有文本SEE PHONE NUMBER的链接按钮，单击该链接将显示电话号码。我正在尝试抓取该特定的电话号码。到目前为止，这是我尝试过的：

company_url = 'https://www.europages.co.uk/PORT-INTERNATIONAL-GMBH/00000004710372-508993001.html'
d = {}
try :
    options = webdriver.FirefoxOptions()
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--incognito')
    options.add_argument('--headless')
    driver = webdriver.Firefox(options=options)
    driver.get(company_url)
    link = driver.find_element_by_link_text('See phone number')
    link.click()
    driver.close()
    page = driver.page_source
    soup = bs(page, 'html.parser')
    tel_no = soup.find('div', {'class' : 'info-tel-num'})
    tel_no = tel_no.text
    d['telephone'] = tel_no
except Exception as e:
    print(f'Error encountered : {e}')

但是每次，它都会在异常块中打印此错误：

遇到错误：消息：无法找到元素：请参阅电话号码

此链接按钮没有任何特定的ID或类，因此我不能使用find_element_by_id或find_element_by_class。这是我通过该按钮上的inspect元素发现的（单击之前）：

这是单击按钮后的检查元素结果：

如何抓取这个电话号码？我在做什么错了？

Answer 1

所需元素是启用了JavaScript的元素，因此必须在元素上定位click()并为element_to_be_clickable()引入 WebDriverWait ，然后可以使用以下任一解决方案：

使用CSS_SELECTOR：

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a[onclick^='EpGetInfoTel']"))).click()

使用XPATH：

WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[starts-with(@onclick, 'EpGetInfoTel') and text()='See phone number']"))).click()

注意：您必须添加以下导入：

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

要抓取电话号码，可以使用以下代码行：

print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//a[starts-with(@onclick, 'EpGetInfoTel') and text()='See phone number']//following::div[1]"))).get_attribute("innerHTML"))

控制台输出：
```
+49 04 03 01 00 00
```
浏览器快照：

Answer 2

要单击该链接，您需要将其带入视口或执行javascript命令。这是你可以去的方式。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

link = "https://www.europages.co.uk/PORT-INTERNATIONAL-GMBH/00000004710372-508993001.html"

with webdriver.Chrome() as driver:
    driver.get(link)
    elem = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"[itemprop='telephone'] > a")))
    driver.execute_script("arguments[0].click();",elem)
    phone = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"div.info-tel-num"))).text
    print(phone)

Answer 3

使用它来单击查看电话号码

$("[itemprop='telephone'] a")[0].click();

并使用以下方法获取电话号码值：

$("[itemprop='telephone'] [style='display: block;']")[0].innerText

Answer 4

使用WebDriverWait并单击带有以下xpath的元素。然后，如果要在执行过程中使用BeautifulSoup，则获取page_source。

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
company_url = 'https://www.europages.co.uk/PORT-INTERNATIONAL-GMBH/00000004710372-508993001.html'
d = {}
try :
    options = webdriver.FirefoxOptions()
    options.add_argument('--ignore-certificate-errors')
    options.add_argument('--incognito')
    options.add_argument('--headless')
    driver = webdriver.Firefox(options=options)
    driver.get(company_url)
    link =WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//a[contains(.,"See phone number")]')))
    link.click()
    time.sleep(2)
    page = driver.page_source
    driver.close()
    soup = bs(page, 'html.parser')
    tel_no = soup.find('div', {'class' : 'info-tel-num'})
    tel_no = tel_no.text
    d['telephone'] = tel_no
except Exception as e:
   print(f'Error encountered : {e}')


print(d)

控制台上的输出：

{'telephone': '+49 04 03 01 00 00'}

Python：在硒中刮擦时无法找到元素

4 个答案: