::before ::after 使用 python 的网页抓取内容返回 [ ]

时间:2021-05-06 00:06:23

标签: python css web-scraping beautifulsoup pseudo-element

我对 python 非常陌生,甚至是整体编码。我已经使用此代码成功抓取了大约 10 个网站,但令人遗憾的是它不适用于该网站。我想提取每个产品类别的所有 div。但 div 没有出现在 page_soup 中。我读到关于 ::before 和 ::after 是一个问题,但我找不到有效的解决方案。我的代码中可能有多个问题,但我找不到它们。我已经有 2 个星期的不眠之夜了。请帮忙

from urllib.request import  urlopen as uReq
from bs4 import BeautifulSoup as soup

url = "https://www.woolworths.co.za/cat?No=60&Nrpp=60&Ns=p_pl30|0"

uClient=uReq(url)
page_html=uClient.read()
uClient.close()

page_soup=soup(page_html,"html.parser")
containers=page_soup.findAll("div",{"class":"product-list__item"})
print(containers)
        
quit()

我得到的结果 [ ]

如果我必须提供更多信息,请告诉我

1 个答案:

答案 0 :(得分:0)

通过发送 GET 请求并将正确的 headers 添加到:

https://www.woolworths.co.za/server/searchCategory?pageURL=%2Fcat&No=60&Nrpp=60&Ns=p_pl30%7C0

您可以仅使用 requests 模块获取数据,无需使用 BeautifulSoup

import requests


headers = {
    "Referer": "https://www.woolworths.co.za/cat?No=60&Nrpp=60&Ns=p_pl30|0",
    "user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/90.0.4430.93 Safari/537.36",
}

URL = "https://www.woolworths.co.za/server/searchCategory?pageURL=%2Fcat&No=60&Nrpp=60&Ns=p_pl30%7C0"
response = requests.get(URL, headers=headers).json()
fmt_string = "{:<70} {:<15} {}"

print(fmt_string.format("Brand", "Price", "Image"))
print("-" * 200)

for d in response["contents"][0]["mainContent"][0]["contents"]:
    for dd in d["records"]:
        print(
            fmt_string.format(
                dd["attributes"]["p_displayName"],
                dd["startingPrice"]["p_pl30"],
                "https://images.woolworthsstatic.co.za/"
                + dd["attributes"]["p_externalImageReference"],
            )
        )

输出(截断):

Brand                                                                  Price           Image
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Soda Water Sparkling Drink 200 ml                                      9.49            https://images.woolworthsstatic.co.za/Soda-Water-Sparkling-Drink-200-ml-6001009034250.jpg?V=50$J&o=eucyUmAbqcqMSs0IjPaS4WA$mzoj&
Salt & Vinegar Flavoured Potato Crisps 36 g                            9.49            https://images.woolworthsstatic.co.za/Salt-Vinegar-Flavoured-Potato-Crisps-36-g-6009175413541.jpg?V=6Pfl&o=Tyz@wbWHKvnW@Kc69RTJYM7WBUQj&
Salted Farmer's Crisps 36 g                                            9.49            https://images.woolworthsstatic.co.za/Salted-Farmer-s-Crisps-36-g-6009217630752.jpg?V=TM2n&o=eR0n3eqV0@15TKcRSRz1RzzVPW8j&
Lemonade Sugar Free Sparkling Flavoured Drink 200 ml                   9.49            https://images.woolworthsstatic.co.za/Lemonade-Sugar-Free-Sparkling-Flavoured-Drink-200-ml-6001009014238.jpg?V=9Skb&o=hqcUN6THi9J8YRixCQEica2ftcMj&
Ginger Ale Sugar Free Sparkling Flavoured Drink 200 ml                 9.49            https://images.woolworthsstatic.co.za/Ginger-Ale-Sugar-Free-Sparkling-Flavoured-Drink-200-ml-6001009014245.jpg?V=x9Pr&o=C7oKpoX27D3z2vf11X7bKRmUJEsj&
Cheddar Flavoured Crisps 36 g                                          9.49            https://images.woolworthsstatic.co.za/Cheddar-Flavoured-Crisps-36-g-6009217630776.jpg?V=zCPn&o=Ufn3jhhUzUGckf72QHLRaa64g20j&
...