我正在尝试从下面的页面获取信息
http://books.toscrape.com/
我想获得每本书的评分(星级),我使用了下面的代码
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import re
response = requests.get(
'http://books.toscrape.com/')
if response.status_code == 200:
print('Requisição bem sucedida!')
linhas = soup.find_all(class_=re.compile("rating"))
但接下来是
<p class="star-rating Three">
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
<i class="icon-star"></i>
</p>,
我做错了什么?
答案 0 :(得分:0)
实际上,class-name 包含了星级值,因此我们可以使用 attrs['class']
方法或 d['class'][1]
进行提取!
import requests
from bs4 import BeautifulSoup
from urllib.request import urlopen
import re
response = requests.get(
'http://books.toscrape.com/')
soup=BeautifulSoup(response.text,"html.parser")
data=soup.find_all("p",class_="star-rating")
for d in data:
print(d.attrs['class'][1])
输出:
Three
One
One
Four
..