如何从内部div中抓取元素?

时间:2017-01-19 16:58:38

标签: python web-scraping beautifulsoup

我无法让我的代码打印出我需要的内容。我可以打印一个完整的div列表,我想提取没有问题,但是当我尝试从我的div中提取跨度时没有任何打印。

这是我的代码:

import requests
from BeautifulSoup import BeautifulSoup

url = “https://reviews.solutionreach.com/vs/reviews/abate_and_ortisi?limit=50”

response = requests.get(url)
html = response.content
soup = BeautifulSoup(html)

myReviews = soup.findAll(‘div’, attrs={‘class’: ‘reviewSection’})

for item in myReviews:

    try:
          print item.contents[1].findAll(‘span’, attrs{‘class’:’rating’})[0].text
    except:
          pass
    try:
          print item.contents[1].findAll(‘span’, attrs{‘class’:’reviewTitle’})[0].text
    except:
          pass
    try:
          print item.contents[1].findAll(‘span’, attrs{‘class’:’reviewer’})[0].text
    except:
          pass

以下是我尝试从

中提取数据的页面

https://reviews.solutionreach.com/vs/reviews/abate_and_ortisi?limit=50

1 个答案:

答案 0 :(得分:0)

import bs4, requests

r = requests.get('https://reviews.solutionreach.com/vs/reviews/abate_and_ortisi?limit=50')

soup = bs4.BeautifulSoup(html, 'lxml')
for div in soup.find_all(class_="reviewBlock"):
    rating = div.find(class_="value-title").get('title')
    title = div.find(class_="reviewTitle").text
    reviewer = div.find(class_="reviewer").text
    print(rating, title, reviewer)

出:

5 Always a pleasant visit!  Dr. Ortisi has been m... Justine F.
5 I absolutely Love the office staff and Dr Ortis... Richard C.
5 Love Dr Abate: does great painless work, friend... Nicole P.
5 It is always a pleasure to be at the Doctor's off. George C.
5 Best Denists in Town Elaine B.
5  Ronald J.
5 . Sharron L.
5 A great experience. Clean professional staff and ... Crystal R.
5 I had another great experience at your dental off... Vicki M.
5  David C.
5 I really like the office and dental staff at Orti... Ami A.
5 Steve Torakis Steven T.
5 Great visit Donald F.
  1. find()将扫描所有后代,无需使用contents[1]
  2. 如果您需要标记,请使用find(),如果您需要标记,请使用find_all()
  3. rating号码是span代码的属性,而不是文字