我正在尝试从页面上抓取电话号码。其中一个页面是this。所有页面都包含一个带有文本SEE PHONE NUMBER
的链接按钮,单击该链接将显示电话号码。我正在尝试抓取该特定的电话号码。到目前为止,这是我尝试过的:
company_url = 'https://www.europages.co.uk/PORT-INTERNATIONAL-GMBH/00000004710372-508993001.html'
d = {}
try :
options = webdriver.FirefoxOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get(company_url)
link = driver.find_element_by_link_text('See phone number')
link.click()
driver.close()
page = driver.page_source
soup = bs(page, 'html.parser')
tel_no = soup.find('div', {'class' : 'info-tel-num'})
tel_no = tel_no.text
d['telephone'] = tel_no
except Exception as e:
print(f'Error encountered : {e}')
但是每次,它都会在异常块中打印此错误:
遇到错误:消息:无法找到元素:请参阅电话号码
此链接按钮没有任何特定的ID或类,因此我不能使用find_element_by_id
或find_element_by_class
。这是我通过该按钮上的inspect元素发现的(单击之前):
这是单击按钮后的检查元素结果:
答案 0 :(得分:3)
所需元素是启用了JavaScript的元素,因此必须在元素上定位click()
并为element_to_be_clickable()
引入 WebDriverWait ,然后可以使用以下任一解决方案:
使用CSS_SELECTOR
:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "a[onclick^='EpGetInfoTel']"))).click()
使用XPATH
:
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//a[starts-with(@onclick, 'EpGetInfoTel') and text()='See phone number']"))).click()
注意:您必须添加以下导入:
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
要抓取电话号码,可以使用以下代码行:
print(WebDriverWait(driver, 5).until(EC.visibility_of_element_located((By.XPATH, "//a[starts-with(@onclick, 'EpGetInfoTel') and text()='See phone number']//following::div[1]"))).get_attribute("innerHTML"))
控制台输出:
+49 04 03 01 00 00
浏览器快照:
答案 1 :(得分:0)
要单击该链接,您需要将其带入视口或执行javascript命令。这是你可以去的方式。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
link = "https://www.europages.co.uk/PORT-INTERNATIONAL-GMBH/00000004710372-508993001.html"
with webdriver.Chrome() as driver:
driver.get(link)
elem = WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR,"[itemprop='telephone'] > a")))
driver.execute_script("arguments[0].click();",elem)
phone = WebDriverWait(driver, 10).until(EC.visibility_of_element_located((By.CSS_SELECTOR,"div.info-tel-num"))).text
print(phone)
答案 2 :(得分:0)
使用它来单击查看电话号码
$("[itemprop='telephone'] a")[0].click();
并使用以下方法获取电话号码值:
$("[itemprop='telephone'] [style='display: block;']")[0].innerText
答案 3 :(得分:0)
使用WebDriverWait
并单击带有以下xpath的元素。然后,如果要在执行过程中使用BeautifulSoup,则获取page_source
。
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from bs4 import BeautifulSoup as bs
company_url = 'https://www.europages.co.uk/PORT-INTERNATIONAL-GMBH/00000004710372-508993001.html'
d = {}
try :
options = webdriver.FirefoxOptions()
options.add_argument('--ignore-certificate-errors')
options.add_argument('--incognito')
options.add_argument('--headless')
driver = webdriver.Firefox(options=options)
driver.get(company_url)
link =WebDriverWait(driver,10).until(EC.element_to_be_clickable((By.XPATH,'//a[contains(.,"See phone number")]')))
link.click()
time.sleep(2)
page = driver.page_source
driver.close()
soup = bs(page, 'html.parser')
tel_no = soup.find('div', {'class' : 'info-tel-num'})
tel_no = tel_no.text
d['telephone'] = tel_no
except Exception as e:
print(f'Error encountered : {e}')
print(d)
控制台上的输出:
{'telephone': '+49 04 03 01 00 00'}