BeautifulSoup:h2 标签内的标签 href

时间:2021-05-17 11:38:08

标签: beautifulsoup

我试图在 h2 标签内的“a”标签中获取链接,但我遇到的问题是其中有 2 个在单独的“父”标签中。

我正在查看链接:https://emerging-europe.com/tag/poland/

以下是我到现在为止的代码。

from bs4 import BeautifulSoup
import requests

url='https://emerging-europe.com/tag/poland/'
response=requests.get(url)

soup=BeautifulSoup(response.content,'lxml')

for item in soup.select('.col-lg-6'):
    try:
        headline = item.find('h2', {'class':'entry-title'}).get_text()
        link = item.find('h2', {'class':'entry-title'})['href']
           
    except:
        continue

我所指的 html 是下面的那个。

<div class="col-lg-6 col-md-6 col-sm-7">
        <div class="entry-header">
                            <span class="meta-category"><a href="https://emerging-europe.com/category/news/" class="herald-cat-210">News &amp; Analysis</a></span>
            
            <h2 class="entry-title h3"><a href="https://emerging-europe.com/news/montenegro-leads-cee-in-ilga-europes-new-rainbow-map/">Montenegro leads CEE on ILGA-Europe’s new Rainbow Map</a></h2>
                            <div class="entry-meta"><div class="meta-item herald-date"><span class="updated">May 17, 2021</span></div><div class="meta-item herald-author"><span class="vcard author"><span class="fn"><a href="https://emerging-europe.com/author/marekgrzegorczyk/">Marek Grzegorczyk</a></span></span></div></div>
                    </div>

                    <div class="entry-content">
                <p>Montenegro is Central and Eastern Europe’s best performer on the latest edition of the ILGA-Europe Rainbow Europe Map and Index, which monitors LGBTI rights across...</p>
            </div>
        
                    <a class="herald-read-more" href="https://emerging-europe.com/news/montenegro-leads-cee-in-ilga-europes-new-rainbow-map/" title="Montenegro leads CEE on ILGA-Europe’s new Rainbow Map">Read More</a>
            </div>

我想获得“https://emerging-europe.com/news/montenegro-leads-cee-in-ilga-europes-new-rainbow-map/”链接,但我得到的是“https” ://emerging-europe.com/category/news/”之一。我如何引用第二个?

感谢您的帮助!

1 个答案:

答案 0 :(得分:1)

试试这个来获取所有的文章网址:

import requests
from bs4 import BeautifulSoup

url = "https://emerging-europe.com/tag/poland/"
css = ".entry-header .entry-title, .entry-header .entry-title a, .post-author-list .categoriesarticle .title a"

soup = BeautifulSoup(requests.get(url).text, "lxml").select(css)
article_links = [a.find("a")["href"] for a in soup if a.find("a") is not None]
print("\n".join(article_links))

输出:

https://emerging-europe.com/voices/the-zangezur-corridor-is-a-geo-economic-revolution/
https://emerging-europe.com/news/montenegro-leads-cee-in-ilga-europes-new-rainbow-map/
https://emerging-europe.com/business/made-in-emerging-europe-vinted-up-catalyst-propergate/
https://emerging-europe.com/news/polish-government-shifts-left-on-economy/
https://emerging-europe.com/news/georgias-modern-parliament-building-faces-uncertain-future-elsewhere-in-emerging-europe/
https://emerging-europe.com/after-hours/mixed-feelings-as-libeskind-reimagines-lodz/
https://emerging-europe.com/news/hungarys-united-opposition-emerging-europe-this-week/
https://emerging-europe.com/business/small-local-market-think-international-from-the-start/
https://emerging-europe.com/business/new-esg-guidelines-can-strengthen-polish-capital-market/
https://emerging-europe.com/news/why-is-the-left-propping-up-polands-right-wing-government/
https://emerging-europe.com/news/cee-should-redouble-efforts-to-end-violence-against-women/
https://emerging-europe.com/after-hours/a-century-on-the-silesian-uprisings-remains-complicated/
https://emerging-europe.com/voices/the-zangezur-corridor-is-a-geo-economic-revolution/
https://emerging-europe.com/news/montenegro-leads-cee-in-ilga-europes-new-rainbow-map/
https://emerging-europe.com/business/made-in-emerging-europe-vinted-up-catalyst-propergate/