我正在尝试编写一个抓取工具,用于抓取网站上产品的产品 ID。
import requests
from bs4 import BeautifulSoup
URL = 'https://stockx.com/de-de/air-jordan-1-retro-high-dark-mocha'
headers = {
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36'
}
r = requests.get(URL, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
soup.find('div', {'class':'detail'})
print(soup)
我想访问 class="detail",但是在执行它时它给了我整个站点的 html? 我做错了什么?
答案 0 :(得分:0)
出了什么问题
soup
soup = BeautifulSoup(r.text, 'html.parser')
所以它正在打印整个 htmldetail = soup.find('div', {'class':'detail'})
试试这个:
import requests
from bs4 import BeautifulSoup
URL = 'https://stockx.com/de-de/air-jordan-1-retro-high-dark-mocha'
headers = {
'user-agent': 'Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/87.0.4280.88 Mobile Safari/537.36'
}
r = requests.get(URL, headers=headers)
soup = BeautifulSoup(r.text, 'html.parser')
detail = soup.find('div', {'class':'detail'})
print(detail)