您好我正试图从Macy的网站上提取信息,特别是从这个类别='https://www.macys.com/shop/featured/women-handbags'。但是当我访问特定的项目页面时,我得到一个空白页面,其中包含以下消息:
拒绝访问 您无权访问此服务器上的“上述类别链接中列出的任何项目链接”。 参考文献#18.14d6f7bd.1526927300.12232a22
我也尝试使用chrome选项更改用户代理,但它不起作用。
这是我的代码:
import sys
reload(sys)
sys.setdefaultencoding('utf8')
from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
url = 'https://www.macys.com/shop/featured/women-handbags'
def init_selenium():
global driver
driver = webdriver.Chrome("/Users/rodrigopeniche/Downloads/chromedriver")
driver.get(url)
def find_page_items():
items_elements = driver.find_elements_by_css_selector('li.productThumbnailItem')
for index, element in enumerate(items_elements):
items_elements = driver.find_elements_by_css_selector('li.productThumbnailItem')
item_link = items_elements[index].find_element_by_tag_name('a').get_attribute('href')
driver.get(item_link)
driver.back()
init_selenium()
find_page_items()
知道发生了什么,我该怎么做才能解决它?
答案 0 :(得分:0)
它不是面向硒的解决方案(全部通过),但它有效。你可以尝试一下。
from selenium import webdriver
import requests
from bs4 import BeautifulSoup
url = 'https://www.macys.com/shop/featured/women-handbags'
def find_page_items(driver,link):
driver.get(link)
item_link = [item.find_element_by_tag_name('a').get_attribute('href') for item in driver.find_elements_by_css_selector('li.productThumbnailItem')]
for newlink in item_link:
res = requests.get(newlink,headers={"User-Agent":"Mozilla/5.0"})
soup = BeautifulSoup(res.text,"lxml")
name = soup.select_one("h1[itemprop='name']").text.strip()
print(name)
if __name__ == '__main__':
driver = webdriver.Chrome()
try:
find_page_items(driver,url)
finally:
driver.quit()
输出:
Mercer Medium Bonded-Leather Crossbody
Mercer Large Tote
Nolita Medium Satchel
Voyager Medium Multifunction Top-Zip Tote
Mercer Medium Crossbody
Kelsey Large Crossbody
Medium Mercer Gallery
Mercer Large Center Tote
Signature Raven Large Tote
但是,如果您坚持使用selenium,那么每次浏览新网址时都需要创建它的新实例,或者更好的选择是清除缓存。