我正试图从天才中刮些歌曲。我创建了以下方法:
import requests
from bs4 import BeautifulSoup
def get_song_lyrics(link):
response = requests.get(link)
soup = BeautifulSoup(response.text, "html.parser")
lyrics = soup.find("div",attrs={'class':'lyrics'}).find("p").get_text()
return [i for i in lyrics.splitlines()]
我不明白为什么
get_song_lyrics('https://genius.com/Kanye-west-black-skinhead-lyrics')
返回:
AttributeError:'NoneType'对象没有属性'find'
与此同时:
get_song_lyrics('https://genius.com/Kanye-west-hold-my-liquor-lyrics')
正确返回歌曲的歌词。两个页面具有相同的布局。 有人可以帮我弄清楚吗?
答案 0 :(得分:2)
页面返回两个HTML版本。您可以使用此脚本来照顾他们两个人:
import requests
from bs4 import BeautifulSoup
url = 'https://genius.com/Kanye-west-black-skinhead-lyrics'
soup = BeautifulSoup(requests.get(url).content, 'lxml')
for tag in soup.select('div[class^="Lyrics__Container"], .song_body-lyrics p'):
for i in tag.select('i'):
i.unwrap()
tag.smooth()
t = tag.get_text(strip=True, separator='\n')
if t:
print(t)
打印:
[Produced By Daft Punk & Kanye West]
[Verse 1]
For my theme song (Black)
My leather black jeans on (Black)
My by-any-means on
...and so on.
答案 1 :(得分:1)
我不确定是什么原因引起的,但是看起来BeautifulSoup
有时会成功,有时却不会成功,这不是由于您的代码所致。如果代码不成功,一种解决方法是重新运行该函数:
import requests
from bs4 import BeautifulSoup
def get_song_lyrics(link):
response = requests.get(link)
soup = BeautifulSoup(response.text, "html.parser")
try:
lyrics = soup.find("div",attrs={'class':'lyrics'}).find("p").get_text()
return [i for i in lyrics.splitlines()]
except AttributeError:
return get_song_lyrics(link)
get_song_lyrics('https://genius.com/Kanye-west-black-skinhead-lyrics')