Python - 目标网页抓取

时间:2021-06-01 14:45:37

标签: python selenium selenium-webdriver web-scraping

我正在尝试从这个目标市场 link 中获取筹码名称,并尝试在第一页中自动获取所有 28 个筹码。我写了这段代码。打开链接,向下滚动(获取名称和图片)并尝试获取名称;

import time
from selenium import webdriver
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
from webdriver_manager.chrome import ChromeDriverManager as CM

options = webdriver.ChromeOptions()
options.add_argument("--log-level=3")

mobile_emulation = {
    "userAgent": 'Mozilla/5.0 (Linux; Android 4.0.3; HTC One X Build/IML74K) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/83.0.1025.133 Mobile Safari/535.19'
}
options.add_experimental_option("mobileEmulation", mobile_emulation)

bot = webdriver.Chrome(executable_path=CM().install(), options=options)

bot.get('https://www.target.com/c/chips-snacks-grocery/-/N-5xsy7')
bot.set_window_size(500, 950)
time.sleep(5)

for i in range(0,3):
    ActionChains(bot).send_keys(Keys.END).perform()
    time.sleep(1)

product_names = bot.find_elements_by_class_name('Link-sc-1khjl8b-0 styles__StyledTitleLink-mkgs8k-5 kdCHb inccCG h-display-block h-text-bold h-text-bs flex-grow-one')

hrefList = []
for e in product_names:
    hrefList.append(e.get_attribute('href'))

for href in hrefList:
    print(href)

当我从浏览器检查名称时,所有芯片的共同部分都是 Link-sc-1khjl8b-0 styles__StyledTitleLink-mkgs8k-5 kdCHb inccCG h-display-block h-text-bold h-text-bs flex-grow-one 类名称。所以如您所见,我添加了 find_elements_by_class_name('Link-sc-1khjl8b-0 styles__StyledTitleLink-mkgs8k-5 kdCHb inccCG h-display-block h-text-bold h-text-bs flex-grow-one') 行。但它给出了空结果。怎么了?你能帮助我吗?解决方案可以是 seleniumbs4 无关紧要。

3 个答案:

答案 0 :(得分:2)

只要输入正确的密钥,就可以从 api 中获取所有数据。

import requests


url = 'https://redsky.target.com/redsky_aggregations/v1/web/plp_search_v1'
payload = {
'key': 'ff457966e64d5e877fdbad070f276d18ecec4a01',
'category': '5xsy7',
'channel': 'WEB',
'count': '28',
'default_purchasability_filter': 'true',
'include_sponsored': 'true',
'offset': '0',
'page': '/c/5xsy7',
'platform': 'desktop',
'pricing_store_id': '1771',
'scheduled_delivery_store_id': '1771',
'store_ids': '1771,1768,1113,3374,1792',
'useragent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.212 Safari/537.36',
'visitor_id': '0179C80AE1090201B5D5C1D895ADEA6C'}

jsonData = requests.get(url, params=payload).json()    


for each in jsonData['data']['search']['products']:
    title = each['item']['product_description']['title']
    buy_url = each['item']['enrichment']['buy_url']
    image_url = each['item']['enrichment']['images']['primary_image_url']        
    print(title)

输出:

Ruffles Cheddar & Sour Cream Potato Chips - 2.5oz
Doritos 3D Crunch Chili Cheese Nacho - 6oz
Hippeas Vegan White Cheddar Organic Chickpea Puffs - 5oz
PopCorners Spicy Queso - 7oz
Doritos 3D Crunch Spicy Ranch - 6oz
Pringles Snack Stacks Variety Pack Potato Crisps Chips - 12.9oz/18ct
Frito-Lay Variety Pack Flavor Mix - 18ct
Doritos Nacho Cheese Chips - 9.75oz
Hippeas Nacho Vibes Organic Chickpea Puffs - 5oz
Tostitos Scoops Tortilla Chips -10oz
Ripple Potato Chips Party Size - 13.5oz - Market Pantry™
Ritz Crisp & Thins Cream Cheese & Onion Potato And Wheat Chips - 7.1oz
Pringles Sour Cream & Onion Potato Crisps Chips - 5.5oz
Original Potato Chips Party Size - 15.25oz - Market Pantry™
Organic White Corn Tortilla Chips - 12oz - Good & Gather™
Sensible Portions Sea Salt Garden Veggie Straws - 7oz
Traditional Kettle Chips - 8oz - Good & Gather™
Lay's Classic Potato Chips - 8oz
Cheetos Crunchy Flamin Hot - 8.5oz
Sweet Potato Kettle Chips - 7oz - Good & Gather™
SunChips Harvest Cheddar Flavored Wholegrain Snacks - 7oz
Frito-Lay Variety Pack Classic Mix - 18ct
Doritos Cool Ranch Chips - 10.5oz
Lay's Wavy Original Potato Chips - 7.75oz
Frito-Lay Variety Pack Family Fun Mix - 18ct
Cheetos Jumbo Puffs - 8.5oz
Frito-Lay Fun Times Mix Variety Pack - 28ct
Doritos Nacho Cheese Flavored Tortilla Chips - 15.5oz
Lay's Barbecue Flavored Potato Chips - 7.75oz
SunChips Garden Salsa Flavored Wholegrain Snacks - 7oz
Pringles Snack Stacks Variety Pack Potato Crisps Chips - 12.9oz/18ct
Frito-Lay Variety Pack Doritos & Cheetos Mix - 18ct

答案 1 :(得分:1)

这也有效:

product_names = bot.find_elements_by_xpath("//li[@data-test='list-entry-product-card']")

hrefList = []
for e in product_names:
    print(e.find_element_by_css_selector("a").get_attribute("href"))

答案 2 :(得分:0)

试试吧

product_names = bot.find_elements_by_css_selector('Link-sc-1khjl8b-0.styles__StyledTitleLink-mkgs8k-5.kdCHb.inccCG.h-display-block.h-text-bold.h-text-bs.flex-grow-one')

使用 find_elements_by_class_name() 时,类名中的空格处理不当。

除了那个选择器对我不起作用之外,我需要使用 '.Link-sc-1khjl8b-0.ItemLink-sc-1eyz3ng-0.kdCHb.dtKueh'