查找URL的评分

时间:2019-02-21 21:37:44

标签: python web-scraping beautifulsoup nlp

我正在尝试创建一个包含对20家银行的评论的数据框,并在以下代码中尝试获得20个客户的评分值,但是由于我是BeautifulSoup和Webscraping的新手,因此很难。

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')


 Rating = []
rat_elem = soup.find_all('span')
for rate in rat_elem:
    Rating.append(rate.find_all('div').get('value')) 

 print(Rating)

2 个答案:

答案 0 :(得分:2)

我更喜欢使用CSS选择器,因此您应该能够通过将itemprop属性设置为ratingvalue的跨度来定位所有跨度。

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

Rating = []
for rate in soup.select('span[itemprop=ratingvalue]'):
    Rating.append(rate.get_text()) 

print(Rating)

相关输出

['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']  

编辑:添加相关输出

答案 1 :(得分:0)

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.bankbazaar.com/reviews.html'
page = requests.get(url)
print(page.text)
soup = BeautifulSoup(page.text,'html.parser')

# Find all the span elements where the "itemprop" attribute is "ratingvalue". 
Rating = [item.text for item in soup.find_all('span', attrs={"itemprop":"ratingvalue"})]


print(Rating)
# The output
# ['4.0', '5.0', '5.0', '5.0', '4.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '5.0', '5.0', '5.0', '5.0', '4.0', '4.5', '4.0', '4.0', '4.0']

BeautifulSoup keyword arguments