Question

我正在写一个网络刮刀并试图找回德雷克的歌词。我的刮刀必须访问一个站点（主要的metrolyrics网站），然后访问每个单独的歌曲链接，然后打印出歌词。

我在访问第二个链接时遇到问题。我在BeautifulSoup上搜索过，我很困惑。我想知道你是否可以提供帮助。

# this is intended to print all of the drake song lyrics on metrolyrics

from pyquery import PyQuery as pq
from lxml import etree
import requests
from bs4 import BeautifulSoup

# this visits the website
response = requests.get('http://www.metrolyrics.com/drake-lyrics.html')

# this separates the different types of content
doc = pq(response.content)

# this finds the titles in the content
titles = doc('.title')

# this visits each title, then prints each verse
for title in titles:
    # this visits each title
  response_title = requests.get(title)
    # this separates the content
  doc2 = pq(response_title.content)
    # this finds the song lyrics
  verse = doc2('.verse')
    # this prints the song lyrics
  print verse.text

在response_title = requests.get（title）中，python没有认识到标题是一个链接，这是有道理的。但是我怎么能得到实际的呢？感谢您的帮助。

Answer 1

替换

response_title = requests.get(title)

与

response_title = requests.get(title.attrib['href'])

完整的工作脚本（以下评论中的固定注释）

#!/usr/bin/python

from pyquery import PyQuery as pq
from lxml import etree
import requests
from bs4 import BeautifulSoup

# this visits the website
response = requests.get('http://www.metrolyrics.com/drake-lyrics.html')

# this separates the different types of content
doc = pq(response.content)

# this finds the titles in the content
titles = doc('.title')

# this visits each title, then prints each verse
for title in titles:
    # this visits each title
  #response_title = requests.get(title)
  response_title = requests.get(title.attrib['href'])

    # this separates the content
  doc2 = pq(response_title.content)
    # this finds the song lyrics
  verse = doc2('.verse')
    # this prints the song lyrics
  print verse.text()

Answer 2

如果你想使用BeautifulSoup的所有文字：

r = requests.get('http://www.metrolyrics.com/drake-lyrics.html')
soup = (a["href"] for a in BeautifulSoup(r.content).find_all("a", "title", href=True))
verses = (BeautifulSoup(requests.get(url).content).find_all("p", "verse") for url in soup)

for verse in verses:
    print([v.text for v in verse])

使用python访问链接并打印数据

2 个答案: