我正在写一个网络刮刀并试图找回德雷克的歌词。 我的刮刀必须访问一个站点(主要的metrolyrics网站),然后访问每个单独的歌曲链接,然后打印出歌词。
我在访问第二个链接时遇到问题。我在BeautifulSoup上搜索过,我很困惑。我想知道你是否可以提供帮助。
# this is intended to print all of the drake song lyrics on metrolyrics
from pyquery import PyQuery as pq
from lxml import etree
import requests
from bs4 import BeautifulSoup
# this visits the website
response = requests.get('http://www.metrolyrics.com/drake-lyrics.html')
# this separates the different types of content
doc = pq(response.content)
# this finds the titles in the content
titles = doc('.title')
# this visits each title, then prints each verse
for title in titles:
# this visits each title
response_title = requests.get(title)
# this separates the content
doc2 = pq(response_title.content)
# this finds the song lyrics
verse = doc2('.verse')
# this prints the song lyrics
print verse.text
在response_title = requests.get(title)中,python没有认识到标题是一个链接,这是有道理的。但是我怎么能得到实际的呢?感谢您的帮助。
答案 0 :(得分:4)
替换
response_title = requests.get(title)
与
response_title = requests.get(title.attrib['href'])
完整的工作脚本(以下评论中的固定注释)
#!/usr/bin/python
from pyquery import PyQuery as pq
from lxml import etree
import requests
from bs4 import BeautifulSoup
# this visits the website
response = requests.get('http://www.metrolyrics.com/drake-lyrics.html')
# this separates the different types of content
doc = pq(response.content)
# this finds the titles in the content
titles = doc('.title')
# this visits each title, then prints each verse
for title in titles:
# this visits each title
#response_title = requests.get(title)
response_title = requests.get(title.attrib['href'])
# this separates the content
doc2 = pq(response_title.content)
# this finds the song lyrics
verse = doc2('.verse')
# this prints the song lyrics
print verse.text()
答案 1 :(得分:0)
如果你想使用BeautifulSoup的所有文字:
r = requests.get('http://www.metrolyrics.com/drake-lyrics.html')
soup = (a["href"] for a in BeautifulSoup(r.content).find_all("a", "title", href=True))
verses = (BeautifulSoup(requests.get(url).content).find_all("p", "verse") for url in soup)
for verse in verses:
print([v.text for v in verse])