单击多个页面上的按钮后如何抓取数字?

时间:2019-04-01 18:56:27

标签: python selenium web-scraping

以前,我问过如何单击页面上的按钮。它第一次起作用,但我意识到有时它有时起作用,但有时却没有。问题是我有多个页面,有时我得到某些页面的编号,但是对于某些页面我什么也得不到。有没有办法获取我需要的所有数据? Project是我在这门Python入门课程中的期末考试。

需要单击的按钮在页面的右上角,并且显示文本“Prikažibroj”。 这是我的尝试,但无法按我的意愿进行操作:

condos = [
'https://www.nekretnine.rs/stambeni-objekti/stanovi/vracar-lokacija-juzni-bulevar-adresa-vojvode-hrvoja-beograd/1958955/',
'https://www.nekretnine.rs/stambeni-objekti/stanovi/vozdovac-autokomanda-trise-kaclerovica-90m2-trise-kaclerovica/NkvU3_gZyb6/',
'https://www.nekretnine.rs/stambeni-objekti/stanovi/vracar-prote-mateje-78m2-id1187/NkwQVDgJqsw/',
'https://www.nekretnine.rs/stambeni-objekti/stanovi/palilula-botanicka-basta-bulevar-despota-stefana-60m2-bulevar-despota-stefana/1734451/',
'https://www.nekretnine.rs/stambeni-objekti/stanovi/palilula-postanska-stedionica-dalmatinska-94m2-dalmatinska/Nk1bTYWifZj/',
'https://www.nekretnine.rs/stambeni-objekti/stanovi/stari-grad-kalemegdan-strahinjica-bana-37m2-strahinjica-bana/NklcRCutVNB/',
'https://www.nekretnine.rs/stambeni-objekti/stanovi/palilula-borca-moravske-divizije-73m2-moravske-divizije/207667/',
'https://www.nekretnine.rs/stambeni-objekti/stanovi/palilula-visnjicka-banja-slobodana-jovanovica-75m2-slobodana-jovanovica/Nk2nu-zdbzW/',
'https://www.nekretnine.rs/stambeni-objekti/stanovi/zvezdara-mirijevo-jovanke-radakovic-61m2-jovanke-radakovic/NkW5Qg22seE/',
'https://www.nekretnine.rs/stambeni-objekti/stanovi/zvezdara-deram-pijaca-duke-dinic-80m2-duke-dinic/Nk26as4b71N/']

condo_agency_home_phones = []
condo_agency_cell_phones = []

options = Options()
options.headless = False
driver = webdriver.Chrome('/Users/Nenad/chromedriver', options=options)
for condo in condos:
    driver.get(condo)
    try:
        element = driver.find_element_by_css_selector('body > div:nth-child(14) > div.row.pt-4 > div.col-lg-4.mb-5 > div.border-box.pt-3.pl-3.pr-3.pb-0.d-none.d-lg-block > div > div.row > div.col-12.col-sm-6.contact-footer > div > div > form:nth-child(2) > button').click()
        sleep(randint(3, 5))
        element2 = driver.find_element_by_css_selector('body > div:nth-child(14) > div.row.pt-4 > div.col-lg-4.mb-5 > div.border-box.pt-3.pl-3.pr-3.pb-0.d-none.d-lg-block > div > div.row > div.col-12.col-sm-6.contact-footer > div > div > form:nth-child(4) > button').click()
        sleep(randint(3, 5))
        home_phone = driver.find_element_by_css_selector('body > div:nth-child(14) > div.row.pt-4 > div.col-lg-4.mb-5 > div.border-box.pt-3.pl-3.pr-3.pb-0.d-none.d-lg-block > div > div.row > div.col-12.col-sm-6.contact-footer > div > div > form:nth-child(2) > span')
        cell_phone = driver.find_element_by_css_selector('body > div:nth-child(14) > div.row.pt-4 > div.col-lg-4.mb-5 > div.border-box.pt-3.pl-3.pr-3.pb-0.d-none.d-lg-block > div > div.row > div.col-12.col-sm-6.contact-footer > div > div > form:nth-child(4) > span')
        condo_agency_home_phones.append(home_phone.text)
        condo_agency_cell_phones.append(cell_phone.text)
    except:
        condo_agency_home_phones.append('NaN')
        condo_agency_cell_phones.append('NaN')

我得到的解决方案是:

element = driver.find_element_by_css_selector('button[type="button"]').click()

有时会单击此按钮,而单击后我仍然不知道如何提取数字。 如果有人知道该怎么做,请告诉我。

4 个答案:

答案 0 :(得分:0)

欢迎来到SO。 这是选项。

选项1:使用预期条件(通过这种方式,您可以确保在单击之前已找到元素)

wait = WebDriverWait(self.driver, 10)
ele = wait.until(EC.presence_of_element_located((By.XPATH, "//button[.='Prikaži broj']")))
ele.click

选项2:使用Java脚本。 (这类似于在按钮上调度click事件)

ele = driver.find_element_by_xpath("//button[.='Prikaži broj']")
driver.execute_script("arguments[0].click();",ele);

答案 1 :(得分:0)

您也可以考虑等待可点击的

WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "form button[type=button]"))).click()

其他进口:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

答案 2 :(得分:0)

使用WebDriverWait处理动态元素。但是,您需要花点时间。单击按钮以获取完整的电话号码后,sleep(1)。

condo_agency_home_phones = []
condo_agency_cell_phones = []
 for condo in condos:
        driver.get(condo)

        try:
            wait=WebDriverWait(driver,10)
            element =wait.until(expected_conditions.element_to_be_clickable((By.XPATH,"//button[contains(text(),'broj')]")))
            element.click()
            time.sleep(1)
            home_phone=wait.until(expected_conditions.element_to_be_clickable((By.XPATH,"(//span[@class='cell-number'])[1]")))
            condo_agency_home_phones.append(home_phone.text)

            wait1 = WebDriverWait(driver, 10)
            element2 =wait1.until(expected_conditions.element_to_be_clickable((By.XPATH,"//button[contains(text(),'broj')]")))
            element2.click()
            time.sleep(1)
            wait2 = WebDriverWait(driver, 10)
            cell_phone=wait2.until(expected_conditions.element_to_be_clickable((By.XPATH,"(//span[@class='cell-number'])[2]")))
            condo_agency_cell_phones.append(cell_phone.text)
        except:
            condo_agency_home_phones.append('NaN')
            condo_agency_cell_phones.append('NaN')

    print(condo_agency_home_phones,condo_agency_cell_phones)

请注意,您需要使用以下导入。

from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions
from selenium import webdriver
import time

答案 3 :(得分:0)

以下代码涉及Adblock,大部分时间我都能获得所有数字:

path_to_extension = r'C:\Users\Nenad\Desktop\3.42.0_0'
options = Options()
options.add_argument('load-extension=' + path_to_extension)
options.headless = False
driver = webdriver.Chrome('/Users/Nenad/chromedriver', options=options)
driver.create_options()

扩展路径是从以下位置复制的:

C:\ Users \ Nenad \ AppData \ Local \ Google \ Chrome \ User Data \ Default \ Extensions \ gighmmpiobklfepjocnamgkkbiglidom

我认为这是可行的解决方案。