Question

我正在使用Beautifulsoup从博客中检索艺术家姓名，并给出音乐标签的特定匹配：

import requests
from bs4 import BeautifulSoup

r = requests.get('http://musicblog.kms-saulgau.de/tag/chillout/')
html = r.content

soup = BeautifulSoup(html, 'html.parser')

艺术家姓名存储在此处：

header = soup.find_all('header', class_= "entry-header")

和艺术家标签：

span = soup.find_all('span', class_= "tags-links")

我可以获得所有标题：

for each in header:
    if each.find("a"):
        each = each.find("a").get_text()
        print each

然后我在同一个页脚中寻找'替代'和'chillout'：

for each in span:
    if each.find("a"):
        tags = each.find("a")["href"]
        if "alternative" in tags:      
            print each.get_text()

到目前为止，代码打印出来：

Terra Nine – The Heart of the Matter
Emmit Fenn – Blinded
Amparo – The Orchid Glacier
Alpha Minus – Satellites
Carbonates on Mars – The Song of Sol
Josey Marina – Ocean Sighs
Sunday – Only
Some Kind Of Illness – The Light
Vesna Kazensky – Raven
James Lowe – Shallow

Tags Alternative, Chillout, Indie Rock, New tracks

但我要做的就是只返回匹配页脚的条目，如下：

Some Kind Of Illness – The Light
Alternative, Chillout, Indie Rock, New tracks

我怎样才能做到这一点？

Answer 1

for article in soup.find_all('article'):
    if article.select('a[href*="alternative"]') and article.select('a[href*="chillout"]'):
        print(article.h2.text)
        print(article.find(class_='tags-links').text)

出：

Some Kind Of Illness – The Light
Tags Alternative, Chillout, Indie Rock, New tracks

BeautifulSoup - 返回标题对应匹配的页脚

1 个答案: