Question

我目前正试图从TripAdvisor网站上的各个餐厅取得经纬度。我正在浏览香港这家餐厅的HTML。

Restaurant I am attempting to scrape from

在HTML中我发现了这个：

HTML Code with the Latitude and Longitude

我想从这里刮取纬度和经度，但是当我尝试打印时，我似乎无法将其取出。以下是我的代码，任何建议都会有所帮助。

#import libraries
import requests
from bs4 import BeautifulSoup
import csv

#loop to move into the next pages. entries are in increments of 30 per page
for i in range(0, 1, 30):
    #need this here for when you want more than 30
    while i <= range:
        i = str(i)
    #url format offsets the restaurants in increments of 30 after the oa
    url1 = 'https://www.tripadvisor.com/Restaurants-g294217-oa' + i + '-Hong_Kong.html#EATERY_LIST_CONTENTS'
    r1 = requests.get(url1)
    data1 = r1.text
    soup1 = BeautifulSoup(data1, "html.parser")
    for link in soup1.findAll('a', {'property_title'}):
        #print 'https://www.tripadvisor.com/Restaurant_Review-g294217-' + link.get('href')
        restaurant_url = 'https://www.tripadvisor.com/Restaurant_Review-g294217-' + link.get('href')
        #print restaurant_url
        r2 = requests.get(restaurant_url)
        data2 = r2.text
        soup2 = BeautifulSoup(data2, "html.parser")
        for script in soup2.findAll('script', {'type', 'text/javascript', 'lat'}):
            print script.string

Answer 1

要抓取支持javascript的页面，您需要使用selenium。

使用Python

1 个答案: