我正在为网站“yelp.fr”编写一个疤痕脚本,但要废弃该类自动生成的星数: class =“i-stars i-stars - regular-4 rating-large”==> 4开始 class =“i-stars i-stars - regular-3-half rating-large”==> 3.5
我的问题我怎么能这样做?如何在html页面上存在或不存在类
CITIES = "la rochelle(17000)"
places = "Bars"
driver = webdriver.Chrome()
driver.get("https://www.yelp.fr/search?find_desc="+places+"&find_loc="+CITIES+"")
page = driver.page_source
soup = BeautifulSoup(page,"lxml")
etoiles=soup.find_all("div",{"class":"biz-rating biz-rating-large clearfix"})
etoiles.get_attribute("title")
if etoiles:
print "ok"
else:
print "not "
答案 0 :(得分:0)
title
的{{1}}包含星数/等级。你可以像
DIV
答案 1 :(得分:0)
我用这个来解决问题:
yelp_url = "https://www.yelp.com/search?find_desc=%s&find_loc=%s&start=%s"%(place,city,str(id))
headers1 = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/42.0.2311.90 Safari/537.36'}
response1 = requests.get(yelp_url).text
parser = html.fromstring(response1)
print "Parsing the page"
listing1 = parser.xpath("//li[@class='regular-search-result']")
for results in listing1:
if raw_ratings:
ratings = re.findall("\d+[.,]?\d+",cleaned_ratings)[0]
else:
ratings = 0
price_range = len(''.join(raw_price_range)) if raw_price_range else 0
address = ' '.join(' '.join(raw_address).split())
address=unidecode(address)
reservation_available = True if is_reservation_available else False
accept_pickup = True if is_accept_pickup else False
答案 2 :(得分:0)
raw_review_count = results.xpath(".//span[contains(@class,'review-count')]//text()")
raw_price_range = results.xpath(".//span[contains(@class,'price-range')]//text()")
if raw_ratings:
ratings = re.findall("\d+[.,]?\d+",cleaned_ratings)[0]
else:
ratings = 0
price_range = len(''.join(raw_price_range)) if raw_price_range else 0