Question

我正在Mac和BeautifulSoup中使用Python3.6进行一些实验。我正在尝试构建一个简单的程序来从URL中删除歌词并将它们作为纯文本存储在单个变量中，但我发现自己无法遍历html内容。

这是我正在运行的代码：

import requests
import re
from bs4 import BeautifulSoup

r = requests.get("http://www.metrolyrics.com/juicy-lyrics-notorious-big.html")
c = r.content

all = soup.find_all("p",{"class":"verse"})
all[0:10]

for item in all:
    print(item.find_all("p",{"class":"verse"})[0].text)

最后两行代码返回"List index out of range" Error

另外，如果我尝试all = all.text，我会收到以下错误：

AttributeError: ResultSet object has no attribute 'text'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?

我想这应该是简单的事情，但不知道该怎么做。

由于

Answer 1

循环中的item是BeautifulSoup标记（请与type(all[0])一起检查 - ＆gt; <class 'bs4.element.Tag'>）。

所以你可以直接从中提取文字：

for item in all:
    print(item.text)

如果变量all小于10，则会产生超出范围的错误。

无法使用BeautifulSoup

1 个答案: