从下拉列表组合中提取所有链接

时间:2020-02-21 21:18:36

标签: python web-scraping beautifulsoup

我有一个示例网站,我想从该网站中提取所有“ href链接”。它有两个下拉菜单,一旦选择了下拉菜单,它就会显示结果并链接到要下载的手册。 它不会导航到其他页面,而是在同一页面上显示结果。我已经提取了下拉列表的组合,我正在尝试提取手动链接,但找不到链接。

代码如下

from selenium import webdriver
from selenium.webdriver.support.ui import Select
import time
from bs4 import BeautifulSoup
import requests


url = "https://www.cars.com/"

driver = webdriver.Chrome('C:/Users/webdrivers/chromedriver.exe')
driver.get(url)
time.sleep(4)

selectYear = Select(driver.find_element_by_id("odl-selected-year"))

data = []
for yearOption in selectYear.options:
    yearText = yearOption.text
    selectYear.select_by_visible_text(yearText)
    time.sleep(1)

    selectModel = Select(driver.find_element_by_id("odl-selected-model"))
    for modelOption in selectModel.options:
        modelText = modelOption.text
        selectModel.select_by_visible_text(modelText)
        data.append([yearText,modelText])

page = requests.get(url, headers=headers)
soup = BeautifulSoup(page.text, 'html.parser')

content = soup.findAll('div',attrs={"class":"odl-results-container"})

for i in content:
    x = i.findAll(['h3','span'])
    for y in x:
        print(y.get_text())

打印不显示任何数据。如何获得手册的链接?预先感谢

1 个答案:

答案 0 :(得分:0)

您需要单击每种汽车型号和年份的按钮,然后然后从Selenium网络驱动程序中获取渲染的HTML页面源文件 ,而不是带有请求。

在您的内部循环中添加它:

C:\(new destination)

打印输出:

os.system('gst-play-1.0 sine_wave.mp3')