Question

您好，我想在Sephora上的特定页面上抓取所有产品的链接。我的代码只给我前12个链接，而网站上有48个产品。我认为这是因为Sephora是一个User-Interactive网站（如果我输入错了，请纠正我），这样它就不会加载其余的内容。但是我不知道如何得到其余的东西。请发送一些帮助！谢谢！！！

这是我的代码：

from bs4 import BeautifulSoup
import requests

url = "https://www.sephora.com/brand/estee-lauder/skincare"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data,'html.parser')

link_list = []
keyword = 'product'
for link in soup.findAll('a'):
    href = link.get('href')
    if keyword in href:
        link_list.append('https://www.sephora.com/' + href)
else:
    continue

Answer 1

如果查看源代码，您将看到它们的数据存储为json对象。您可以通过以下方式获取json对象：

from bs4 import BeautifulSoup
import requests
import json

url = "https://www.sephora.com/brand/estee-lauder/skincare"
r = requests.get(url)
data = r.text
soup = BeautifulSoup(data,'html.parser')

data = json.loads(soup.find('script', id='linkJSON').text)
products = data[3]['props']['products']

prefix = "https://www.sephora.com"

url_links = [prefix+p["targetUrl"] for p in products]
print(url_links)

通过调查json数据，您可以找到链接的存储位置。为了更清楚地查看json数据，我使用以下网站：https://codebeautify.org/jsonviewer

丝芙兰网站上的href抓取链接

1 个答案: