Question

我正在为网站“yelp.fr”编写一个疤痕脚本，但要废弃该类自动生成的星数： class =“i-stars i-stars - regular-4 rating-large”==＆gt; 4开始 class =“i-stars i-stars - regular-3-half rating-large”==＆gt; 3.5

我的问题我怎么能这样做？如何在html页面上存在或不存在类

CITIES = "la rochelle(17000)"
places = "Bars"
driver = webdriver.Chrome()
driver.get("https://www.yelp.fr/search?find_desc="+places+"&find_loc="+CITIES+"")
page = driver.page_source
soup = BeautifulSoup(page,"lxml")
etoiles=soup.find_all("div",{"class":"biz-rating biz-rating-large clearfix"})

etoiles.get_attribute("title")
if etoiles:
    print "ok"
else:
    print "not "

有些时候，类商业评级商业评级 - 大清晰度不存在如下

Answer 1

title的{{1}}包含星数/等级。你可以像

那样得到它

DIV

Answer 2

我用这个来解决问题：

yelp_url  = "https://www.yelp.com/search?find_desc=%s&find_loc=%s&start=%s"%(place,city,str(id))

        headers1 = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
        response1 = requests.get(yelp_url).text
        parser = html.fromstring(response1)
        print "Parsing the page"
        listing1 = parser.xpath("//li[@class='regular-search-result']")
for results in listing1:
if raw_ratings:
                        ratings = re.findall("\d+[.,]?\d+",cleaned_ratings)[0]
                    else:
                        ratings = 0
                    price_range = len(''.join(raw_price_range)) if raw_price_range else 0
                    address  = ' '.join(' '.join(raw_address).split())
                    address=unidecode(address)
                    reservation_available = True if is_reservation_available else False
                    accept_pickup = True if is_accept_pickup else False

Answer 3

raw_review_count = results.xpath(".//span[contains(@class,'review-count')]//text()")
                    raw_price_range = results.xpath(".//span[contains(@class,'price-range')]//text()")
if raw_ratings:
                        ratings = re.findall("\d+[.,]?\d+",cleaned_ratings)[0]
                    else:
                        ratings = 0
                    price_range = len(''.join(raw_price_range)) if raw_price_range else 0

使用beautifulsoup Python检查HTML中是否存在特定的类和值

3 个答案: