如何从网页中提取电话号码?

时间:2019-08-13 09:09:11

标签: python selenium xpath css-selectors webdriverwait

我正尝试抢一些电话。从一个网站,我不知道为什么我总是得到错误的信息。

我正在使用jupyter来运行代码

from selenium import webdriver

url = 'https://www.europages.co.uk/KIDDYSTORES/00000003902113-191369001.html'
tel = []

# Setup webdriver
driver = webdriver.Chrome('.\\chromedriver.exe')
driver.set_page_load_timeout(10)
driver.get(url)

driver.execute_script("window.scrollTo(0, 720)") 
button = driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/h3')[0]
# //*[@id="content"]/aside/div/div[1]/h3
button.click()
if len(driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/div[6]/ul/li/div/a'))!=0:      
    button = driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/div[6]/ul/li/div/a')[0]
elif len(driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/div[7]/ul/li/div[1]/a'))!=0:
    button = driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/div[7]/ul/li/div[1]/a')[0]
elif len(driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/div[6]/ul/li/div[1]/a'))!=0[0]:
    button = driver.find_elements_by_xpath('//*[@id="content"]/aside/div/div[1]/div[6]/ul/li/div[1]/a')[0]
button.click()


print(driver.find_element_by_xpath('//*[@id="content"]/aside/div/div[1]/div[6]/ul/li/div/div').get_attribute('innerHTML'))
driver.find_element_by_xpath('//*[@id="content"]/aside/div/div[1]/div[6]/ul/li/div/div').get_attribute('innerHTML')

上面是我正在运行的代码,我希望得到的是“ +33 141 57 22 81”,但实际输出是“ \ n \ t \ t \ t \ t \ t \ t \ t \ t”

print(driver.find_element_by_class_name('team-sh-tel').get_attribute('innerHTML'))

但是,当我在jupyter的另一个单元中分别运行代码时,它可以打印出所需的电话号码。

1 个答案:

答案 0 :(得分:1)

要从网站https://www.europages.co.uk/KIDDYSTORES/00000003902113-191369001.html中提取电话号码,您需要为element_to_be_clickable()引入 WebDriverWait ,您可以使用以下{{ 3}}:

  • 使用CSS_SELECTOR的代码块:

    driver.get("https://www.europages.co.uk/KIDDYSTORES/00000003902113-191369001.html")
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.click-tel.icon.icon-telephone"))))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "div.click-tel.icon.icon-telephone"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CLASS_NAME, "info-tel-num"))).get_attribute("innerHTML"))
    
  • 使用XPATH的代码块:

    driver.get("https://www.europages.co.uk/KIDDYSTORES/00000003902113-191369001.html")
    driver.execute_script("return arguments[0].scrollIntoView(true);", WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='click-tel icon icon-telephone']//a[text()='See phone number']"))))
    WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.XPATH, "//div[@class='click-tel icon icon-telephone']//a[text()='See phone number']"))).click()
    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='click-tel icon icon-telephone']//a[text()='See phone number']//following::div[1]"))).get_attribute("innerHTML"))
    
  • 控制台输出:

    +33 141 57 22 81