使用请求和bs4从网页获取链接

时间:2020-03-26 02:44:45

标签: beautifulsoup python-requests

我正在尝试从https://www.supremecommunity.com/season/spring-summer2020/droplists/

中获取最新投稿清单的链接

如果右键单击“最新”并单击“检查”,则会看到以下内容:

该链接每周都会更改,因此我正尝试从该页面中拉出它。

当我这样做

import requests
from bs4 import BeautifulSoup

url = "https://www.supremecommunity.com/season/spring-summer2020/droplists/"
r = requests.get(url)
soup = BeautifulSoup(r.text,"html.parser")
my_data = soup.find('div', attrs = {'id': 'box-latest'})

我得到:

div class="col-sm-4 col-xs-12 app-lr-pad-2" id="box-latest">
<a class="block" href="/season/spring-summer2020/droplist/2020-03-26/">
<div class="feature feature-7 boxed text-center imagebg boxedred sc-app-boxlistitem" data-overlay="7">
<div class="empty-background-image-holder">
<img alt="background" src=""/>
</div>
<h2 class="pos-vertical-center">Latest</h2>
</div>
</a>
</div>

我如何才能拔出"/season/spring-summer2020/droplist/2020-03-26/"部分?

1 个答案:

答案 0 :(得分:0)

import requests
from bs4 import BeautifulSoup

r = requests.get(
    "https://www.supremecommunity.com/season/spring-summer2020/droplists/")
soup = BeautifulSoup(r.content, "html.parser")


print(soup.find("div", id="box-latest").contents[1].get("href"))

输出:

/season/spring-summer2020/droplist/2020-03-26/