我目前正试图从TripAdvisor网站上的各个餐厅取得经纬度。我正在浏览香港这家餐厅的HTML。
Restaurant I am attempting to scrape from
在HTML中我发现了这个:
HTML Code with the Latitude and Longitude
我想从这里刮取纬度和经度,但是当我尝试打印时,我似乎无法将其取出。以下是我的代码,任何建议都会有所帮助。
#import libraries
import requests
from bs4 import BeautifulSoup
import csv
#loop to move into the next pages. entries are in increments of 30 per page
for i in range(0, 1, 30):
#need this here for when you want more than 30
while i <= range:
i = str(i)
#url format offsets the restaurants in increments of 30 after the oa
url1 = 'https://www.tripadvisor.com/Restaurants-g294217-oa' + i + '-Hong_Kong.html#EATERY_LIST_CONTENTS'
r1 = requests.get(url1)
data1 = r1.text
soup1 = BeautifulSoup(data1, "html.parser")
for link in soup1.findAll('a', {'property_title'}):
#print 'https://www.tripadvisor.com/Restaurant_Review-g294217-' + link.get('href')
restaurant_url = 'https://www.tripadvisor.com/Restaurant_Review-g294217-' + link.get('href')
#print restaurant_url
r2 = requests.get(restaurant_url)
data2 = r2.text
soup2 = BeautifulSoup(data2, "html.parser")
for script in soup2.findAll('script', {'type', 'text/javascript', 'lat'}):
print script.string