Python美丽汤find_all

时间:2020-05-20 03:23:26

标签: python beautifulsoup

嗨,我正在尝试从网站上获取一些信息。如果我格式化错误,请原谅我,这是我第一次发布至SO。

soup.find('div', {"class":"stars"}) 

由此我收到

<div class="stars" title="4.0 star rating">
<i class="star star--large star-0"></i><i class="star star--large star- 
1"></i><i class="star star--large star-2"></i><i class="star star--large 
star-3"></i><i class="star star--large star-4 star--large--muted"></i> 
</div>

我需要"4.0 star rating"

当我使用时:

soup.find('div', {"class":"stars"})["title"]

它有效,但不适用于find_all。但我正在尝试查找所有案例并将其放入列表中。

这是我下面的完整代码。

    def get_info():
        from IPython.display import HTML
        import requests
        from bs4 import BeautifulSoup
        n = 1
        for page in range(53):
            url = f"https://www.sitejabber.com/reviews/apple.com?page= 
   {n}&sort=Reviews.processed&direction=DESC#reviews"
            r = requests.get(url)
            soup = BeautifulSoup(r.text, 'lxml')
            all_reviews = soup.find_all('div', {'class':"truncate_review"})
            all_dates = soup.find_all('div', {'class':'review__date'},'title')
            all_titles = soup.find_all('span', {'class':'review__title__text'})
            reviews_class = soup.find('div', {"class":"review__stars"})
            for review in all_reviews:

    all_reviews_list.append(review.text.replace("\n","").replace("\t",""))
            for date in all_dates:

all_dates_list.append(date.text.replace("\n","").replace("\t",""))
            for title in all_titles:

  all_titles_list.append(title.text.replace("\n","").replace("\t",""))
            for stars in reviews_class.find_all('div', {'class':'stars'}):
                all_star_ratings.append(stars['title'])



            n += 1

对不起,我的缩进有点混乱,但这是我的完整代码。

2 个答案:

答案 0 :(得分:0)

像在字典中一样遍历bs4元素。
如果您使用的是find()

soup.find('div', {"class":"stars"}) ['title']

这有效,因为find()返回单个值。
但是,如果您使用的是find_all(),它将返回一个列表,而list[string]是无效的过程。
因此,您可以创建以下列表:

res = []
for i in soup.find_all('div', {"class":"stars"}):
    res.append(i['title'])

否则,单线:

res = [i['title'] for i in soup.find_all('div', {"class":"stars"})]

由于要获得评论的所有标题,因此需要指定评论容器,即从以下位置进行剪贴:

<div class="review__container">

因此代码将是:

review = soup.find_all('div',class_="review__container")
res = [i['title'] for j in review for i in j.find_all('div',class_='stars')]

给予:

['1.0 star rating', '1.0 star rating', '3.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '5.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '1.0 star rating', '5.0 star rating', '2.0 star rating', '5.0 star rating', '1.0 star rating', '2.0 star rating', '1.0 star rating', '5.0 star rating', '1.0 star rating', '5.0 star rating']

答案 1 :(得分:0)

以下内容

localhost:9200/_search?pretty&source={"query":{"query_string":{"query":"chicag*","fields":["name"],"_name":"myqry"}}}&source_content_type=application/json

打印

from bs4 import BeautifulSoup

html = """<div class="stars" title="4.0 star rating">
<i class="star star--large star-0"></i><i class="star star--large star- 
1"></i><i class="star star--large star-2"></i><i class="star star--large 
star-3"></i><i class="star star--large star-4 star--large--muted"></i> 
</div>"""

soup = BeautifulSoup(html, features="lxml")
element = soup.select('.stars')[0]['title']
print(element)

使用网址

4.0 star rating

打印

import requests
from bs4 import BeautifulSoup

url = 'https://www.sitejabber.com/reviews/apple.com?page={n}&sort=Reviews.processed&direction=DESC#reviews'
page = requests.get(url=url)

soup = BeautifulSoup(page.text, features="lxml")

elements = soup.select('.stars')
# print(elements)

for element in elements:
    print(element['title'])