我有下面的代码给我正确的href链接,该链接指向产品详细信息页面,但是,我的抓取结果显示为空列表。我想在“添加到购物车”按钮下获取产品说明。我在这里想念什么?
输出:
https://www.nike.com/t/nikecourt-air-zoom-vapor-x-mens-hard-court-tennis-shoe-6J0fk8/AA8030-103
[]
https://www.nike.com/t/nikecourt-zoom-cage-3-mens-hard-court-tennis-shoe-mbXWvX
[]
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from random import randint
from time import sleep
final = []
with requests.Session() as s:
driver = webdriver.Chrome('/Users/Selenium/bin/chromedriver')
###########THIS IS THE URL
driver.get('https://store.nike.com/us/en_us/pw/mens-tennis-shoes/7puZ8r0Zoi3')
products = [element for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='grid-item fullSize']")))]
driver.execute_script('el = document.elementFromPoint(47, 457); el.click();')
soup = bs(driver.page_source, 'lxml')
items = soup.select('.grid-item-content')
titles = [item.find("p", {"class" : lambda L: L and L.startswith('product-display-name')}).text.strip() for item in items]
links = [item.find('a')['href'] for item in items]
results = list(zip(titles, links))
df = pd.DataFrame(results)
for result in results:
res = s.get(result[1])
soup = bs(res.content, 'lxml')
print(result[1])
details = [item for item in soup.select('.description-preview fs16-sm css-1pbvugb')]
print(details)
driver.quit()
答案 0 :(得分:1)
就像JS渲染到页面一样。您可以在迭代过程中再次使用driver.page_source。
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from random import randint
from time import sleep
#'/Users/Selenium/bin/chromedriver'
final = []
with requests.Session() as s:
driver = webdriver.Chrome('/Users/Selenium/bin/chromedriver')
###########THIS IS THE URL
driver.get('https://store.nike.com/us/en_us/pw/mens-tennis-shoes/7puZ8r0Zoi3')
products = [element for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='grid-item fullSize']")))]
driver.execute_script('el = document.elementFromPoint(47, 457); el.click();')
soup = bs(driver.page_source, 'lxml')
items = soup.select('.grid-item-content')
titles = [item.find("p", {"class" : lambda L: L and L.startswith('product-display-name')}).text.strip() for item in items]
links = [item.find('a')['href'] for item in items]
results = list(zip(titles, links))
df = pd.DataFrame(results)
for result in results:
driver.get(result[1])
soup = bs(driver.page_source, 'lxml')
print(result[1])
details = [item.text for item in soup.select('.description-preview.fs16-sm.css-1pbvugb')]
print(details)
driver.quit()
输出:
https://www.nike.com/t/nikecourt-air-zoom-vapor-x-mens-hard-court-tennis-shoe-6J0fk8/AA8030-103
['With Nike Zoom Air and a Dynamic Fit system, the NikeCourt Air Zoom Vapor X provides ultimate control on hard courts.Shown: White/BlackStyle: AA8030-103']
https://www.nike.com/t/nikecourt-zoom-cage-3-mens-hard-court-tennis-shoe-mbXWvX
['The NikeCourt Zoom Cage 3 is made for the player seeking strength and speed on the hard court. The shoe’s unique cage design provides maximum durability and cushioning, and is also lighter than ever.Shown: Black/WhiteStyle: 918193-006']
https://www.nike.com/t/nikecourt-air-zoom-zero-mens-tennis-shoe-nHMRHN
['Featuring the first full-length Zoom Air unit in NikeCourt history, the NikeCourt Air Zoom Zero delivers exceptional responsiveness and great court feel. Its snug-fitting upper and webbed lacing system offer second-skin-like comfort and lockdown.Shown: Black/Black/WhiteStyle: AA8018-003']
https://www.nike.com/t/nikecourt-air-max-wildcard-mens-tennis-shoe-p9NhX7
['The NikeCourt Air Max Wildcard delivers the comfort you need to hit hard and move fast on the court. A Max Air unit under your heel cushions every step, while an innovative Lunarlon midsole provides a springy underfoot sensation and extra stability.Shown: Black/Phantom/Bright Crimson/PhantomStyle: AO7351-006']
https://www.nike.com/t/nikecourt-zoom-cage-3-mens-hard-court-tennis-shoe-l3qpKZ/918193-005
['The NikeCourt Zoom Cage 3 is made for the player seeking strength and speed on the hard court. The shoe’s unique cage design provides maximum durability and cushioning, and is also lighter than ever.Shown: Platinum Tint/Laser Fuchsia/Thunder GreyStyle: 918193-005']
https://www.nike.com/t/nikecourt-air-zoom-resistance-mens-hard-court-tennis-shoe-qmZW1o/918194-003
['The\xa0NikeCourt Air Zoom Resistance delivers lightweight durability on the hard court with a performance leather upper.Shown: Black/Bright Crimson/WhiteStyle: 918194-003']
https://www.nike.com/t/nikecourt-air-zoom-prestige-mens-hard-court-tennis-shoe-vY8981
['The NikeCourt Air Zoom Prestige combines the responsiveness of Zoom Air technology with the lockdown of Dynamic Fit for glove-like comfort and support on hard courts.Shown: Vast Grey/Indigo Force/Indigo ForceStyle: AA8020-054']
https://www.nike.com/t/nikecourt-lite-mens-hard-court-tennis-shoe-7qqvCd
['The NikeCourt Lite is built for total comfort with a premium upper and a durable outsole designed for hard\xa0courts.Shown: White/Medium Grey/BlackStyle: 845021-100']
https://www.nike.com/t/nikecourt-lite-mens-hard-court-tennis-shoe-VrTWWAE1/845021-054
['The NikeCourt Lite is built for total comfort with a premium upper and a durable outsole designed for hard\xa0courts.Shown: Vast Grey/Indigo ForceStyle: 845021-054']
答案 1 :(得分:1)
我试图查看是否可以直接使用API并抓住它,但是似乎找不到它。但是,它在json格式的<script>
标签中可用。只需找到它,然后遍历即可获得您想要的东西。里面还有价格,客户评论和各种数据:
import requests
from bs4 import BeautifulSoup as bs
import pandas as pd
import time
from selenium import webdriver
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from random import randint
from time import sleep
import json
final = []
with requests.Session() as s:
s.headers.update({'Accept-Language': 'en-US'})
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
###########THIS IS THE URL
driver.get('https://store.nike.com/us/en_us/pw/mens-tennis-shoes/7puZ8r0Zoi3')
products = [element for element in WebDriverWait(driver, 30).until(EC.visibility_of_all_elements_located((By.XPATH, "//div[@class='grid-item fullSize']")))]
driver.execute_script('el = document.elementFromPoint(47, 457); el.click();')
soup = bs(driver.page_source, 'lxml')
items = soup.select('.grid-item-content')
titles = [item.find("p", {"class" : lambda L: L and L.startswith('product-display-name')}).text.strip() for item in items]
links = [item.find('a')['href'] for item in items]
results = list(zip(titles, links))
df = pd.DataFrame(results)
for result in results:
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/73.0.3683.103 Safari/537.36',
'Accept-Language': 'en-US'}
res = s.get(result[1], headers=headers )
soup = bs(res.text, 'lxml')
print(result[1])
scripts = soup.find_all('script')
for script in scripts:
if 'window.INITIAL_REDUX_STATE=' in script.text:
jsonStr = script.text.split('window.INITIAL_REDUX_STATE=')[1]
jsonStr = jsonStr.rsplit(';',1)[0]
jsonData = json.loads(jsonStr)
for k, v in jsonData['Threads']['products'].items():
details = bs(v['description'], 'lxml').text
print(details,'\n')
driver.quit()
输出:
https://www.nike.com/t/nikecourt-air-zoom-vapor-x-mens-hard-court-tennis-shoe-6J0fk8/AA8030-103
ULTRALIGHT SPEED.With Nike Zoom Air and a Dynamic Fit system, the NikeCourt Air Zoom Vapor X provides ultimate control on hard courts.Secure FitThe Dynamic Fit system wraps your foot from the bottom of the arch up to the laces for a glove-like fit.Responsive CushioningA Zoom Air unit in the heel offers low-profile, resilient cushioning from swing to swing.Quick StabilityThe full-length TPU foot frame wraps up the outside of your foot for added stability on every turn and swing.More BenefitsPadded collar provides additional comfort.Built up rubber on the toe increases durability and protection from drag.Non-marking rubber outsole for durable traction on hard courts.Shown: Black/Bright Crimson/WhiteStyle: AA8030-016
https://www.nike.com/t/nikecourt-zoom-cage-3-mens-hard-court-tennis-shoe-mbXWvX
STRENGTH AND SPEED.The NikeCourt Zoom Cage 3 is made for the player seeking strength and speed on the hard court. The shoe’s unique cage design provides maximum durability and cushioning, and is also lighter than ever.Maximum DurabilityMade with a lightweight CPU cage built up in the high wear zone areas specific to tennis. “Zoned” cage adds stability without adding weight.Exceptional TractionThe modified herringbone outsole delivers excellent traction and durability. Ideal for hard court surfaces.
Complete ComfortNike Zoom Air unit in the heel delivers responsive, lightweight cushioning.More BenefitsExternal heel clip is efficiently shaped to secure the heel.Flexible support in the midfoot provides lightweight stability.Full bootie construction wraps your foot for a snug fit.Kurim material on upper allows for elasticity and flexibility.Shown: White/Light Carbon/Light Blue Fury/ObsidianStyle: 918193-104
https://www.nike.com/t/nikecourt-air-zoom-zero-mens-tennis-shoe-nHMRHN
COURT FEEL, OPTIMIZED.Featuring the first full-length Zoom Air unit in NikeCourt history, the NikeCourt Air Zoom Zero delivers exceptional responsiveness and great court feel. Its snug-fitting upper and webbed lacing system offer second-skin-like comfort and lockdown.BenefitsFull-length Zoom Air unit is curved to deliver responsive cushioning.Integrated crash pad helps promote a smooth heel-to-toe transition.1/2 sleeve provides a snug, sock-like fit.Gilly straps on the medial and lateral side integrate with the laces for a customizable fit.Midsole foam on top of the front Zoom Air unit brings the unit closer to the ground.Midsole foam underneath the back of the Zoom Air unit brings the unit closer to your heel.Outsole is cored out in the middle to reduce weight and show off the Zoom Air unit.Outsole material wraps over the toe on the medial side for added durability while sliding.Shown: Vast Grey/Indigo ForceStyle: AA8018-044
...