使用Python和BeautifulSoup刮擦alt标记

时间:2017-06-08 06:36:03

标签: python beautifulsoup

Python的新手,而BeautifulSoup和我正在努力削减评论员在Yelp上离开餐厅的明星数量。

到目前为止,我有以下代码:

import requests
from bs4 import BeautifulSoup as soup

url = "https://www.yelp.com/biz/monkey-house-cafe-huntington-beach"
r = requests.get(url)
page_soup = soup(r.content, "lxml")

review_container = page_soup.findAll("div", {"class": "review-content"})
review_container[0]

当我在Jupyter Notebook中运行该代码时,我得到以下内容,这与最近的评论相对应:

<div class="review-content">
<div class="biz-rating biz-rating-large clearfix">
<div>
<div class="i-stars i-stars--regular-5 rating-large" title="5.0 star rating">
<img alt="5.0 star rating" class="offscreen" height="303" src="https://s3-media1.fl.yelpcdn.com/assets/srv0/yelp_design_web/41341496d9db/assets/img/stars/stars.png" width="84"/>
</div>
</div>
<span class="rating-qualifier">
    5/10/2017
</span>
</div>
<p lang="en">This place is really fun and cute. I was happy to discover it.. <br/><br/>They also have beer and wine here, which is kind of a nice bonus. The sangria is good..</p>
</div>

我的问题是如何从每次审核中获得星数?

我认为最好刮掉img alt标签的内容,但我不知道该怎么做。

2 个答案:

答案 0 :(得分:1)

如果您想从img alt中提取,可以使用:

review_container[0].select('img')[0]['alt'].split()[0]
'5.0'

答案 1 :(得分:0)

float(review_container[0].find("img")["alt"][:3])
相关问题