Question

my target is to scrape the link to each hotel, but the source does not have <a> element at all.... what should i do? How did the website hide their link?

there should have a link for each name... but source code is like this:

<h3 class="hotel-name" data-selenium="hotel-name">Hilton Osaka</h3>

Added Link here

，如何抓取网址

Answer 1

还有更多工作要做，因为当您向下滚动时，酒店名称只会显示出来，但是链接最少，应该可以带您进入

from selenium import webdriver
from bs4 import BeautifulSoup as soup

url = 'https://www.agoda.com/pages/agoda/default/DestinationSearchResult.aspx?city=9590&checkIn=2019-02-05&los=1&rooms=1&adults=2&children=0&cid=-218&languageId=1&userId=bce6a6f2-6f57-418a-9c86-487872685cda&sessionId=ku5ccopu4cm2yqjetfge1fa4&pageTypeId=1&origin=HK&locale=en-US&aid=130589&currencyCode=HKD&htmlLanguage=en-us&cultureInfoName=en-US&ckuid=bce6a6f2-6f57-418a-9c86-487872685cda&prid=0&checkOut=2019-02-06&priceCur=HKD&textToSearch=Osaka&productType=-1&travellerType=1'

# opening up connection, grabbing the page
driver = webdriver.Chrome('C:/chromedriver_win32/chromedriver.exe')
driver.get(url)

# html parsing
page_soup = soup(driver.page_source, "html.parser")
containers = page_soup.find_all("li", {'data-selenium':'hotel-item'})

for ele in containers:
    try:
        link = 'http://www.aggoda.com' + ele.find('a')['href']
    except:
        link = ''

    try:
        name = ele.find('h3').text
    except:
        name = ''

    print ('Hotel: %s\nLink: %s\n' %(name, link))


driver.close()

如果没有<a> element exist in source code?

1 个答案: