我正在使用Beautifulsoup
从博客中检索艺术家姓名,并给出音乐标签的特定匹配:
import requests
from bs4 import BeautifulSoup
r = requests.get('http://musicblog.kms-saulgau.de/tag/chillout/')
html = r.content
soup = BeautifulSoup(html, 'html.parser')
艺术家姓名存储在此处:
header = soup.find_all('header', class_= "entry-header")
和艺术家标签:
span = soup.find_all('span', class_= "tags-links")
我可以获得所有标题:
for each in header:
if each.find("a"):
each = each.find("a").get_text()
print each
然后我在同一个页脚中寻找'替代'和'chillout':
for each in span:
if each.find("a"):
tags = each.find("a")["href"]
if "alternative" in tags:
print each.get_text()
到目前为止,代码打印出来:
Terra Nine – The Heart of the Matter
Emmit Fenn – Blinded
Amparo – The Orchid Glacier
Alpha Minus – Satellites
Carbonates on Mars – The Song of Sol
Josey Marina – Ocean Sighs
Sunday – Only
Some Kind Of Illness – The Light
Vesna Kazensky – Raven
James Lowe – Shallow
Tags Alternative, Chillout, Indie Rock, New tracks
但我要做的就是只返回匹配页脚的条目,如下:
Some Kind Of Illness – The Light
Alternative, Chillout, Indie Rock, New tracks
我怎样才能做到这一点?
答案 0 :(得分:0)
for article in soup.find_all('article'):
if article.select('a[href*="alternative"]') and article.select('a[href*="chillout"]'):
print(article.h2.text)
print(article.find(class_='tags-links').text)
出:
Some Kind Of Illness – The Light
Tags Alternative, Chillout, Indie Rock, New tracks