尝试从多个链接中抓取

时间:2018-11-21 15:30:14

标签: python selenium web-scraping beautifulsoup python-requests

此脚本的目标是访问网站,生成指向特定产品的链接,然后从生成的那些链接中抓取。

在此脚本中,该脚本将通过定义的属性获得产品主页上显示的四个特色产品的链接。链接保存在变量“链接”中,其中包含指向四个特色产品的四个URL。

然后,我将使用请求来请求产品的每个网址,以使用BeautifulSoup抓取数据。

这是我的代码:

import time
from selenium import webdriver
import selenium.webdriver.chrome.service as service
import requests
from bs4 import BeautifulSoup


url = "https://www.vatainc.com/"
service = service.Service('/Users/Name/Downloads/chromedriver.exe')
service.start()
capabilities = {'chrome.binary': '/Google/Chrome/Application/chrome.exe'}
driver = webdriver.Remote(service.service_url, capabilities)
driver.get(url)
time.sleep(2)
links = [x.get_attribute('href') for x in driver.find_elements_by_xpath("//*[contains(@class, 'product-name')]/a")]


html = requests.get(links).text
soup = BeautifulSoup(html, "html.parser")
products = soup.findAll("div")
print(products)

我得到的错误代码:

  

没有找到用于的连接适配器   '['https://www.vatainc.com/0240-bonnie-bone-marrow-biopsy-skills-trainertm.html',   'https://www.vatainc.com/0910-seymour-iitm-wound-care-model-1580.html',   'https://www.vatainc.com/2410-chester-chesttm-with-new-advanced-arm-1197.html',   'https://www.vatainc.com/2365-advanced-four-vein-venipuncture-training-aidtm-dermalike-iitm-latex-free.html']'

1 个答案:

答案 0 :(得分:0)

links是字符串(URL)的列表。您不能将列表作为url的参数传递给requests.get()。尝试遍历此列表,并逐个传递每个URL并获取每个页面:

for link in links:
    html = requests.get(link).text
    soup = BeautifulSoup(html, "html.parser")
    products = soup.findAll("div")
    print(products)