我应该如何从这个网站上抓取 href 链接?

时间:2021-06-04 18:33:57

标签: python beautifulsoup

我正在尝试从此链接 https://www.goodricketea.com/product/darjeeling-tea 获取每个产品的单独 URL 链接 .我应该如何用beautifulsoup做到这一点?有谁能帮帮我吗?

1 个答案:

答案 0 :(得分:3)

要从此站点获取产品链接,您可以执行以下操作:

import requests
from bs4 import BeautifulSoup


url = "https://www.goodricketea.com/product/darjeeling-tea"
soup = BeautifulSoup(requests.get(url).content, "html.parser")

for a in soup.select("a:has(>h2)"):
    print("https://www.goodricketea.com" + a["href"])

打印:

https://www.goodricketea.com/product/darjeeling-tea/roasted-darjeeling-tea-250gm
https://www.goodricketea.com/product/darjeeling-tea/thurbo-darjeeling-tea-whole-leaf-250gm
https://www.goodricketea.com/product/darjeeling-tea/roasted-darjeeling-tea-organic-250gm
https://www.goodricketea.com/product/darjeeling-tea/roasted-darjeeling-tea-100gm
https://www.goodricketea.com/product/darjeeling-tea/thurbo-darjeeling-tea-whole-leaf-100gm
https://www.goodricketea.com/product/darjeeling-tea/thurbo-darjeeling-tea-fannings-250gm
https://www.goodricketea.com/product/darjeeling-tea/castleton-premium-muscatel-darjeeling-tea-100gm
https://www.goodricketea.com/product/darjeeling-tea/castleton-vintage-darjeeling-tea-250gm
https://www.goodricketea.com/product/darjeeling-tea/castleton-vintage-darjeeling-tea-100gm
https://www.goodricketea.com/product/darjeeling-tea/castleton-vintage-darjeeling-tea-bags-50-tea-bags
https://www.goodricketea.com/product/darjeeling-tea/castleton-vintage-darjeeling-tea-bags-100-tea-bags
https://www.goodricketea.com/product/darjeeling-tea/badamtam-exclusive-organic-darjeeling-tea-250gm
https://www.goodricketea.com/product/darjeeling-tea/badamtam-exclusive-organic-darjeeling-tea-100gm
https://www.goodricketea.com/product/darjeeling-tea/seasons-3-in-1-darjeeling-leaf-tea-150gm-first-flush-second-flush-pre-winter-flush
相关问题