我想从下面的HTML获得'8.0':
<div class="js-otelpuani" style="float: left;"> ==$0
"8.0"
<span class="greyish" style="font-size:13px; font-
family:arial;"> /10</span>
::after
</div>
我已经尝试过以下代码来提取div class ='js-otelpuani'中的'8.0',但它似乎没有用;
import urllib
import requests
from bs4 import BeautifulSoup
import pyodbc
headers = {
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5)",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"accept-charset": "cp1254,ISO-8859-9,utf-8;q=0.7,*;q=0.3",
"accept-encoding": "gzip,deflate,sdch",
"accept-language": "tr,tr-TR,en-US,en;q=0.8",
}
r = requests.get('https://www.otelz.com/otel/elvin-deluxehotel#.WkDIBd9l_IU', headers=headers)
if r.status_code != 200:
print("request denied")
else:
print("ok")
soup = BeautifulSoup(r.text)
score = soup.find('div',attrs={'class': 'js-otelpuani'})
print(score)
我将这些作为输出,但不幸的是我无法获得我想要提取的“8.0”值;
ok
<div class="js-otelpuani" style="float: left;">
<span id="comRatingValue">.0</span>
<span class="greyish" style="font-size: 13px; font-family: arial;">
/
<span itemprop="bestRating">10</span></span>
<span id="comRatingCount" itemprop="ratingCount" style="display:
none;">0</span>
<span id="comReviewCount" itemprop="reviewCount" style="display:
none;">0</span>
</div>
我将不胜感激任何帮助!
答案 0 :(得分:2)
如果您检查页面的HTML代码并搜索js-otelpuani
,您会注意到它也会在script
标记内使用,如果您遵循该脚本的逻辑,则会看到评级本身是由对GeneralPartial/Degerlendirmeler/8974
端点的单独查询形成的,其中8974
是酒店ID。
让我们在您的脚本中模拟这个确切的逻辑 - 首先提取酒店ID,发出单独的请求并提取评级值:
import requests
from bs4 import BeautifulSoup
headers = {
"user-agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_5)",
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8",
"accept-charset": "cp1254,ISO-8859-9,utf-8;q=0.7,*;q=0.3",
"accept-encoding": "gzip,deflate,sdch",
"accept-language": "tr,tr-TR,en-US,en;q=0.8",
}
with requests.Session() as session:
session.headers = headers
r = session.get('https://www.otelz.com/otel/elvin-deluxehotel#.WkDIBd9l_IU', headers=headers)
if r.status_code != 200:
print("request denied")
else:
print("ok")
soup = BeautifulSoup(r.text, "html.parser")
# get the hotel id
hotel_id = soup.find(attrs={"data-hotelid": True})["data-hotelid"]
# go for the hotel rating
response = session.get("https://www.otelz.com/GeneralPartial/Degerlendirmeler/{hotel_id}".format(hotel_id=hotel_id))
soup = BeautifulSoup(response.text, "html.parser")
rating_value = soup.find(attrs={'data-rating-value': True})['data-rating-value']
print(rating_value)
打印:
8.0
答案 1 :(得分:1)
你应该使用这样的东西:
soup.find('div', {'class' :'js-otelpuani'}).text
答案 2 :(得分:1)
如果您想购买硒,那么您所访问的数据可以很容易地解析,如下所示:
from bs4 import BeautifulSoup
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('https://www.otelz.com/otel/elvin-deluxehotel#.WkDf39KWa1t')
soup = BeautifulSoup(driver.page_source,"lxml")
for item in soup.select('.js-otelpuani'):
[elem.extract() for elem in soup("span")]
print(item.text)
driver.quit()
输出:
8.0