使用bs4进行网页抓取时如何找到特定类?

时间:2020-12-26 14:09:06

标签: python web web-scraping beautifulsoup python-requests

我正在尝试编写一个抓取工具,用于抓取网站上产品的产品 ID。

import requests
from bs4 import BeautifulSoup

URL = 'https://stockx.com/de-de/air-jordan-1-retro-high-dark-mocha'
headers = {
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36'
}


r = requests.get(URL, headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

soup.find('div', {'class':'detail'})
print(soup)

我想访问 class="detail",但是在执行它时它给了我整个站点的 html? 我做错了什么?

1 个答案:

答案 0 :(得分:0)

出了什么问题

  • 您像这样分配了 soup soup = BeautifulSoup(r.text, 'html.parser') 所以它正在打印整个 html
  • 您想要分配和打印 detail 元素: detail = soup.find('div', {'class':'detail'})

试试这个:

import requests
from bs4 import BeautifulSoup

URL = 'https://stockx.com/de-de/air-jordan-1-retro-high-dark-mocha'
headers = {
    'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36'
}


r = requests.get(URL, headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

detail = soup.find('div', {'class':'detail'})
print(detail)