使用beautifulsoup4提取标题标记元素

时间:2017-08-24 15:27:21

标签: html beautifulsoup

想提取标题中提到的评论评分弹出评级百分比。 这里给出了html:

    a class="a-link-normal" href="http://www.amazon.in/product-reviews/B01FM7GGFI/ref=cm_cr_dp_hist_one/261-4285111-5015802?ie=UTF8&amp;filterByStar=one_star&amp;reviewerType=all_reviews&amp;showViewpoints=0" title="11% of reviews have 1 stars">1 star</a>

beautifulsoup python脚本:

     from bs4 import BeautifulSoup
     import requests
     url = "http://www.amazon.in/Samsung-G-550FY-On5-Pro-Gold/dp/B01FM7GGFI/ref=lp_4363159031_1_1/261-4285111-5015802?s=electronics&ie=UTF8&qid=1503582445&sr=1-1"

    headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/54.0.2840.71 Safari/537.36'}
    r = requests.get(url, headers=headers)
     soup = BeautifulSoup(r.content, "lxml")

    for link in soup.find_all("div", attrs={"class": "a-fixed-left-grid-col a-col-left"}):
      for link1 in link.find_all("a", attrs={"class": "a-link-normal"}):
         print(link1)

1 个答案:

答案 0 :(得分:0)

html = '<a class="a-link-normal" href="http://www.amazon.in/product-reviews/B01FM7GGFI/ref=cm_cr_dp_hist_one/261-4285111-5015802?ie=UTF8&amp;filterByStar=one_star&amp;reviewerType=all_reviews&amp;showViewpoints=0" title="11% of reviews have 1 stars">1 star</a>'
soup = BeautifulSoup(html, 'lxml')

a_tags = soup.find_all('a', class_='a-link-normal')
for a in a_tags:
    if 'title' in a.attrs:
        print(a['title'])