你如何正确地从 Python 中抓取 NVIDIA 网站?

时间:2021-06-25 19:36:51

标签: python html web-scraping beautifulsoup python-requests

我正在使用 BeautifulSoup4 和请求。

from bs4 import BeautifulSoup
import requests

url_NVIDIAGEFORCE = f'https://shop.nvidia.com/de-de/geforce/store/gpu/?page=1&limit=9&locale=de-de&category=GPU&gpu=RTX%203090,RTX%203080%20Ti,RTX%203080,RTX%203070%20Ti,RTX%203070,RTX%203060%20Ti,RTX%203060&gpu_filter=RTX%203090~12,RTX%203080%20Ti~7,RTX%203080~16,RTX%203070%20Ti~3,RTX%203070~18,RTX%203060%20Ti~8,RTX%203060~2,RTX%202080%20SUPER~1,RTX%202080~0,RTX%202070%20SUPER~0,RTX%202070~0,RTX%202060~6,GTX%201660%20Ti~0,GTX%201660%20SUPER~9,GTX%201660~8,GTX%201650%20Ti~0,GTX%201650%20SUPER~3,GTX%201650~17'


page = requests.get(url_NVIDIAGEFORCE).text
soup = BeautifulSoup(page, "lxml")
match = soup.find('div', class_='product_detail_78')
print(match)

几秒钟后,我得到输出:

<块引用>

这个类的div肯定存在,我是从网站上复制过来的。

1 个答案:

答案 0 :(得分:0)

数据是通过 Json 从外部 URL 加载的。你可以用这个例子来解析它:

[Int]

打印:

import json
import requests


url = "https://api.nvidia.partners/edge/product/search?page=1&limit=9&locale=de-de&category=GPU&gpu=RTX%203090,RTX%203080%20Ti,RTX%203080,RTX%203070%20Ti,RTX%203070,RTX%203060%20Ti,RTX%203060&gpu_filter=RTX%203090~12,RTX%203080%20Ti~7,RTX%203080~16,RTX%203070%20Ti~3,RTX%203070~18,RTX%203060%20Ti~8,RTX%203060~2,RTX%202080%20SUPER~1,RTX%202080~0,RTX%202070%20SUPER~0,RTX%202070~0,RTX%202060~6,GTX%201660%20Ti~0,GTX%201660%20SUPER~9,GTX%201660~8,GTX%201650%20Ti~0,GTX%201650%20SUPER~3,GTX%201650~17"
headers = {
    "User-Agent": "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:89.0) Gecko/20100101 Firefox/89.0"
}

data = requests.get(url, headers=headers).json()

# uncomment to print all data:
# print(json.dumps(data, indent=4))

for p in data["searchedProducts"]["productDetails"]:
    print("{:<50} {}".format(p["productTitle"], p["productPrice"]))
相关问题