my target is to scrape the link to each hotel, but the source does not have <a>
element at all.... what should i do? How did the website hide their link?
there should have a link for each name... but source code is like this:
<h3 class="hotel-name" data-selenium="hotel-name">Hilton Osaka</h3>
,如何抓取网址
答案 0 :(得分:0)
还有更多工作要做,因为当您向下滚动时,酒店名称只会显示出来,但是链接最少,应该可以带您进入
from selenium import webdriver
from bs4 import BeautifulSoup as soup
url = 'https://www.agoda.com/pages/agoda/default/DestinationSearchResult.aspx?city=9590&checkIn=2019-02-05&los=1&rooms=1&adults=2&children=0&cid=-218&languageId=1&userId=bce6a6f2-6f57-418a-9c86-487872685cda&sessionId=ku5ccopu4cm2yqjetfge1fa4&pageTypeId=1&origin=HK&locale=en-US&aid=130589¤cyCode=HKD&htmlLanguage=en-us&cultureInfoName=en-US&ckuid=bce6a6f2-6f57-418a-9c86-487872685cda&prid=0&checkOut=2019-02-06&priceCur=HKD&textToSearch=Osaka&productType=-1&travellerType=1'
# opening up connection, grabbing the page
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(url)
# html parsing
page_soup = soup(driver.page_source, "html.parser")
containers = page_soup.find_all("li", {'data-selenium':'hotel-item'})
for ele in containers:
try:
link = 'http://www.aggoda.com' + ele.find('a')['href']
except:
link = ''
try:
name = ele.find('h3').text
except:
name = ''
print ('Hotel: %s\nLink: %s\n' %(name, link))
driver.close()