使用美丽的汤python在特定页面中获取标签时出现问题

时间:2019-03-11 19:24:52

标签: python web-scraping

我正在尝试使用以下代码从此页面www.toctoc.com获取每条信息:

page = requests.get('website_url') #website url was too long
soup = BeautifulSoup(page.content, 'html.parser')

name_box = soup.find_all('div', attrs={'class': 'item'})

输出:[]

有人知道如何在每个类(每个帖子)中找到所有代码吗?

Screenshot of website with inspection tool

1 个答案:

答案 0 :(得分:0)

JavaScript必须在页面上运行。您可以等待所有元素都存在的硒。然后访问特定元素。我只显示您班级的最高水平

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
url = 'https://www.toctoc.com/search/index2/?dormitorios=0&banos=0&superficieDesde=0&superficieHasta=0&precioDesde=0&precioHasta=0&moneda=UF&tipoArriendo=true&tipoVentaUsado=false&tipoVentaNuevo=false&casaDepto=8&ordenarPorMoneda=UFCLP&ordenarDesc=false&ordernarPorFechaPublicacion=false&ordernarPorSuperficie=false&ordernarPorPrecio=false&pagina=1&esMobile=false&textoBusqueda=Regi%C3%B3n%20Metropolitana&textoOriginal=Regi%C3%B3n%20Metropolitana&tipoVista=lista&viewport=-71.715363%2C-34.29047%2C-69.769737%2C-32.922085&comuna=&region=Regi%C3%B3n%20Metropolitana%20de%20Santiago&atributos=&idle=true&zoom=7.053707424896949&buscando=true&vuelveBuscar=false&dibujaPoligono=true&resetMapa=true&animacion=false&idZonaHomogenea=0&esPrimeraBusqueda=false'
driver = webdriver.Chrome()
driver.get(url)
items = WebDriverWait(driver,10).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, ".item")))
for item in items:
    print(item.text)
#driver.quit()