学校项目中的网络抓取

时间:2019-04-01 10:55:40

标签: python selenium web-scraping

我正在尝试使用Selenium从页面抓取数据。我上周做了,但是本周发生了一些变化,现在不起作用了。问题是您可以在网站上看到“显示更多”按钮或“Prikažibroj”。我要抓取多页,但我们只关注其中一页。

代码是:

options = Options()
options.headless = True
driver = webdriver.Chrome('/Users/Nenad/chromedriver', options=options)
driver.get('https://www.nekretnine.rs/stambeni-objekti/stanovi/zvezdara-konjarnik-milica-rakica-57m2-milica-rakica/NkJXDiY2ugE/')
try:
    element = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(2) > button:nth-child(2)').click()
    sleep(randint(3, 5))
    home_phone = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(2) > span:nth-child(1)')
    condo_agency_cell_phones.append(home_phone.text)
except:
    condo_agency_cell_phones.append('NaN')
try:
    element = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(4) > button:nth-child(2)').click()
    sleep(randint(3, 5))
    cell_phone = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(4) > span:nth-child(1)')
    condo_agency_cell_phones.append(cell_phone.text)
except:
    condo_agency_cell_phones.append('NaN')
driver.close()

上周它与xpath一起使用,但是现在不行了。我什至找到了一个按钮,但没有单击:

options = Options()
options.headless = False
driver = webdriver.Chrome('/Users/Nenad/chromedriver', options=options)
driver.get('https://www.nekretnine.rs/stambeni-objekti/stanovi/zvezdara-konjarnik-milica-rakica-57m2-milica-rakica/NkJXDiY2ugE/')
sleep(20)
try:
    element = driver.find_element_by_xpath("//button\[@type='button'\]").click()
    print(element.text)
except:
    print('NaN')

2 个答案:

答案 0 :(得分:1)

尝试使用CSS选择器find_element_by_css_selector(button[type="button"])

代替xpath

答案 1 :(得分:0)

如果第一个答案不能解决您的问题,请尝试此操作。导入了一些不同的库。在您上面的代码中,“ try:”由于未导入库而未定义变量返回错误。

from selenium import webdriver
from selenium.webdriver import Firefox
from selenium.webdriver.firefox.options import Options
from time import sleep
options = Options()
options.headless = True
driver = webdriver.Chrome('/Users/Nenad/chromedriver', options=options)
# driver = webdriver.Firefox(executable_path=r'C:\\Py\\geckodriver.exe');

driver.get('https://www.nekretnine.rs/stambeni-objekti/stanovi/zvezdara-konjarnik-milica-rakica-57m2-milica-rakica/NkJXDiY2ugE/')
condo_agency_cell_phones = []
try:
    element = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(2) > button:nth-child(2)').click()
    # sleep(randint(3, 5))
    sleep(4)
    # home_phone1 = driver.find_element_by_xpath("html/body/div[11]/div[1]/div[2]/div[1]/div/div[2]/div[2]/div/div/form[1]/span")
    # condo_agency_cell_phones.append(home_phone1.text)
    home_phone = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(2) > span:nth-child(1)')
    print(home_phone.text)
    condo_agency_cell_phones.append(home_phone.text)
except:
    condo_agency_cell_phones.append('NaN')
try:
    element = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(4) > button:nth-child(2)').click()
    # sleep(randint(3, 5))
    sleep ( 4 )
    cell_phone = driver.find_element_by_css_selector('div.row:nth-child(2) > div:nth-child(2) > div:nth-child(1) > div:nth-child(1) > form:nth-child(4) > span:nth-child(1)')
    condo_agency_cell_phones.append(cell_phone.text)
except:
    condo_agency_cell_phones.append('NaN')

print(condo_agency_cell_phones)
driver.close()