BeautifulSoup - 返回标题对应匹配的页脚

时间:2017-02-16 03:19:53

标签: python beautifulsoup

我正在使用Beautifulsoup从博客中检索艺术家姓名,并给出音乐标签的特定匹配:

import requests
from bs4 import BeautifulSoup

r = requests.get('http://musicblog.kms-saulgau.de/tag/chillout/')
html = r.content

soup = BeautifulSoup(html, 'html.parser')

艺术家姓名存储在此处:

header = soup.find_all('header', class_= "entry-header")

和艺术家标签:

span = soup.find_all('span', class_= "tags-links")

我可以获得所有标题:

for each in header:
    if each.find("a"):
        each = each.find("a").get_text()
        print each

然后我在同一个页脚中寻找'替代'和'chillout':

for each in span:
    if each.find("a"):
        tags = each.find("a")["href"]
        if "alternative" in tags:      
            print each.get_text()

到目前为止,代码打印出来:

Terra Nine – The Heart of the Matter
Emmit Fenn – Blinded
Amparo – The Orchid Glacier
Alpha Minus – Satellites
Carbonates on Mars – The Song of Sol
Josey Marina – Ocean Sighs
Sunday – Only
Some Kind Of Illness – The Light
Vesna Kazensky – Raven
James Lowe – Shallow

Tags Alternative, Chillout, Indie Rock, New tracks

但我要做的就是只返回匹配页脚的条目,如下:

Some Kind Of Illness – The Light
Alternative, Chillout, Indie Rock, New tracks

我怎样才能做到这一点?

1 个答案:

答案 0 :(得分:0)

for article in soup.find_all('article'):
    if article.select('a[href*="alternative"]') and article.select('a[href*="chillout"]'):
        print(article.h2.text)
        print(article.find(class_='tags-links').text)

出:

Some Kind Of Illness – The Light
Tags Alternative, Chillout, Indie Rock, New tracks