使用beautifulsoup Python检查HTML中是否存在特定类

时间:2016-11-22 16:18:54

标签: python html web-scraping beautifulsoup

我正在编写一个脚本,想要检查html中是否存在特定的类。

from bs4 import BeautifulSoup
import requests

def makesoup(u):
    page=requests.get(u)
    html=BeautifulSoup(page.content,"lxml")
    return html
html=makesoup('https://www.yelp.com/biz/soco-urban-lofts-dallas')

print("3 star",html.has_attr("i-stars i-stars--large-3 rating-very-large")) #it's returning False
res = html.find('i-stars i-stars--large-3 rating-very-large")) #it's returning NONE

请指导我如何解决这个问题?如果我得到了冠军(标题=" 3.0星级"),这对我也有用。控制台HTML enter image description here

的屏幕截图
<div class="i-stars i-stars--large-3 rating-very-large" title="3.0 star rating">
  <img class="offscreen" height="303" src="https://s3-media1.fl.yelpcdn.com/assets/srv0/yelp_design_web/8a6fc2d74183/assets/img/stars/stars.png" width="84" alt="3.0 star rating">
    </div>

3 个答案:

答案 0 :(得分:1)

在获得确切的课程时遇到类似的问题。它们可以作为字典对象返回,如下所示。

html = '<div class="i-stars i-stars--large-3 rating-very-large" title="3.0 star rating">'
soup = BeautifulSoup(html, 'html.parser')
find = soup.div
classes = find.attrs['class']
c1 = find.attrs['class'][0]
print (classes, c1)

答案 1 :(得分:0)

function BasicTable($header, $data) { $nameIndex = array_search ( 'Name' , $header ); foreach($header as $key => $col) { $width = ($key == $nameIndex) ? 80 : 40; $this->Cell($width, 7, $col, 1); } $this->Ln(); // This assumes that $row is an int indexed array // E.G looks like array(0 => 'some Product Reference ', 1 => 'Some Name' , 2 =>'Some Price', 3 => 'Some Unit') foreach($data as $row) { foreach($row as $key => $col) { $width = ($key == $nameIndex) ? 80 : 40; $this->Cell($width, 6, $col, 1); } $this->Ln(); } } 是一种检查元素是否具有所需属性的方法。 has_attr是一个属性,class

i-stars i-stars--large-3 rating-very-large需要CSS selectors,而不是类值。所以你应该使用find。这是因为您正在寻找html.find('div.i-stars.i-stars--large-3.rating-very-large') 所有这些类。

答案 2 :(得分:0)

from bs4 import BeautifulSoup
import requests

def makesoup(u):
    page=requests.get(u)
    html=BeautifulSoup(page.content,"lxml")
    return html
html=makesoup('https://www.yelp.com/biz/soco-urban-lofts-dallas')
res = html.find(class_='i-stars i-stars--large-3 rating-very-large')
if res:
    print("3 star", 'whatever you want print')

出:

3 star whatever you want print