我正在学习以下网页 https://www.youtube.com/watch?v=nCuPv3tf2Hg&list=PLRzwgpycm-Fio7EyivRKOBN4D3tfQ_rpu&index=1 上的在线教程。我不知道我做错了什么。我在 Visual Studio 和 Jupyter 笔记本中都尝试过代码,但无济于事。
代码:
import requests
from bs4 import BeautifulSoup as bs
bURL = 'https://www.thewhiskyexchange.com/c/540/taiwanese-whisky'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
r = requests.get('https://www.thewhiskyexchange.com/c/540/taiwanese-whisky')
soup = bs(r.content, 'lxml')
productlist = soup.find_all('div', class_='item')
productlinks = []
for item in productlist:
for link in item.find_all('a', href=True):
print(link['href'])
答案 0 :(得分:2)
自视频发布以来,该网站的结构发生了变化。
我已在下面修复了您的代码:
import requests
from bs4 import BeautifulSoup as bs
bURL = 'https://www.thewhiskyexchange.com/c/540/taiwanese-whisky'
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36'}
r = requests.get(bURL, headers=headers)
soup = bs(r.text, 'html.parser')
for x in soup.find_all('li', {'class':'product-grid__item'}):
link = x.find('a')
print(x.text, 'https://www.thewhiskyexchange.com'+link['href'])