我对 python 非常陌生,甚至是整体编码。我已经使用此代码成功抓取了大约 10 个网站,但令人遗憾的是它不适用于该网站。我想提取每个产品类别的所有 div。但 div 没有出现在 page_soup 中。我读到关于 ::before 和 ::after 是一个问题,但我找不到有效的解决方案。我的代码中可能有多个问题,但我找不到它们。我已经有 2 个星期的不眠之夜了。请帮忙
from urllib.request import urlopen as uReq
from bs4 import BeautifulSoup as soup
url = "https://www.woolworths.co.za/cat?No=60&Nrpp=60&Ns=p_pl30|0"
uClient=uReq(url)
page_html=uClient.read()
uClient.close()
page_soup=soup(page_html,"html.parser")
containers=page_soup.findAll("div",{"class":"product-list__item"})
print(containers)
quit()
我得到的结果 [ ]
如果我必须提供更多信息,请告诉我
答案 0 :(得分:0)
通过发送 GET
请求并将正确的 headers
添加到:
https://www.woolworths.co.za/server/searchCategory?pageURL=%2Fcat&No=60&Nrpp=60&Ns=p_pl30%7C0
您可以仅使用 requests
模块获取数据,无需使用 BeautifulSoup
:
import requests
headers = {
"Referer": "https://www.woolworths.co.za/cat?No=60&Nrpp=60&Ns=p_pl30|0",
"user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
}
URL = "https://www.woolworths.co.za/server/searchCategory?pageURL=%2Fcat&No=60&Nrpp=60&Ns=p_pl30%7C0"
response = requests.get(URL, headers=headers).json()
fmt_string = "{:<70} {:<15} {}"
print(fmt_string.format("Brand", "Price", "Image"))
print("-" * 200)
for d in response["contents"][0]["mainContent"][0]["contents"]:
for dd in d["records"]:
print(
fmt_string.format(
dd["attributes"]["p_displayName"],
dd["startingPrice"]["p_pl30"],
"https://images.woolworthsstatic.co.za/"
+ dd["attributes"]["p_externalImageReference"],
)
)
输出(截断):
Brand Price Image
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Soda Water Sparkling Drink 200 ml 9.49 https://images.woolworthsstatic.co.za/Soda-Water-Sparkling-Drink-200-ml-6001009034250.jpg?V=50$J&o=eucyUmAbqcqMSs0IjPaS4WA$mzoj&
Salt & Vinegar Flavoured Potato Crisps 36 g 9.49 https://images.woolworthsstatic.co.za/Salt-Vinegar-Flavoured-Potato-Crisps-36-g-6009175413541.jpg?V=6Pfl&o=Tyz@wbWHKvnW@Kc69RTJYM7WBUQj&
Salted Farmer's Crisps 36 g 9.49 https://images.woolworthsstatic.co.za/Salted-Farmer-s-Crisps-36-g-6009217630752.jpg?V=TM2n&o=eR0n3eqV0@15TKcRSRz1RzzVPW8j&
Lemonade Sugar Free Sparkling Flavoured Drink 200 ml 9.49 https://images.woolworthsstatic.co.za/Lemonade-Sugar-Free-Sparkling-Flavoured-Drink-200-ml-6001009014238.jpg?V=9Skb&o=hqcUN6THi9J8YRixCQEica2ftcMj&
Ginger Ale Sugar Free Sparkling Flavoured Drink 200 ml 9.49 https://images.woolworthsstatic.co.za/Ginger-Ale-Sugar-Free-Sparkling-Flavoured-Drink-200-ml-6001009014245.jpg?V=x9Pr&o=C7oKpoX27D3z2vf11X7bKRmUJEsj&
Cheddar Flavoured Crisps 36 g 9.49 https://images.woolworthsstatic.co.za/Cheddar-Flavoured-Crisps-36-g-6009217630776.jpg?V=zCPn&o=Ufn3jhhUzUGckf72QHLRaa64g20j&
...