我想通过搜索邮政编码来提取剧院的位置,然后提取结果。 网站检查是这样的:
<form id="set-location-form" class="ip-geoloc-address" action="/theatres" method="post" accept-charset="UTF-8"><div><button class="btn btn-default form-submit" id="edit-find" name="op" value=" " type="submit"> </button>
<input type="hidden" name="form_build_id" value="form-C5B0Dm8QYZgOzeTv2uf9FlNjWVK-EbcLpDKjRz_HQt4" />
<input type="hidden" name="form_id" value="ip_geoloc_set_location_form" />
<div class="form-type-textfield form-item-street-address form-item form-group">
<input placeholder="Enter your location" class="form-control form-text" type="text" id="edit-street-address" name="street_address" value="" size="60" maxlength="128" />
</div>
<button class="btn btn-default form-submit" id="edit-submit-address" name="op" value="Go" type="submit">Go</button>
<button class="change-view btn-map-expand btn btn-default form-submit" id="edit-map-expand" name="op" value="Map" type="button">Map</button>
<button class="change-view btn btn-default form-submit" id="edit-change-view" name="op" value="" type="button"></button>
inspect的结果如下:
[enter image description here][1]
但是当我查看页面源代码时,它不存在:
<div class="region region-content">
<section id="block-system-main" class="block block-system clearfix">
<div class="view view-theatres view-id-theatres view-display-id-page view-dom-id-8a00da3218aaa60e6d4d49fd07033c0b wrapper-container-box">
<div class="attachment attachment-before fix-wrapper">
<div class="view view-theatres view-id-theatres view-display-id-attachment_1">
<div class="view-content">
<div class="ip-geoloc-map view-based-map">
我尝试了这两个代码,但是没有用。 导入请求
url = 'https://www.imax.com/theatres/'
data = {'street_address':'78759'}
r = requests.get(url, params=data)
with open("requests_results.html", "wb") as f:
f.write(r.content)
data = { 'street_address':'94704'}
# Get the page
# use .post
# send the data
url = "https://www.imax.com/theatres/"
response = requests.post(url,data=data)
doc = BeautifulSoup(response.text, 'html.parser')
有任何帮助,谢谢!
答案 0 :(得分:0)
页面使用纬度和经度来请求数据。您可以模仿xhr(首先获取传递位置的经度和纬度-为此,我使用free API。如何操作取决于您自己。)
您可以在此处查看发出的请求:
响应json的行包含针对键的html。输出示例
由于与行中的键关联的内部值是html,我将其传递回BeautifulSoup进行处理。 html内容示例:
import requests
import json
import pandas as pd
from bs4 import BeautifulSoup as bs
apiKey = "yourFreeAPIkey"
address = "78759"
url = "https://api.opencagedata.com/geocode/v1/json?q=" + address + "&key=" + apiKey + "&pretty=1"
res = requests.get(url).json()
data = res['results'][1]['geometry']
lat = data['lat']
lng = data['lng']
date = '2019-03-09'
res = requests.get('https://www.imax.com/showtimes/ajax/theatres?date=' + date + '&lat=' + str(lat) + '&lon=' + str(lng))
soup = bs(res.content, 'lxml')
newData = json.loads(soup.select_one('p').text)
columns = ['movieTitle', 'movieLink', 'theatreLink', 'address','movieFormat', 'times']
baseURL = 'https://www.imax.com'
results = []
for row in newData['rows']:
soup = bs(row['row'], 'lxml')
link = baseURL + soup.select_one('a')['href']
address = soup.select_one('.theatre-address').text.strip()
movieTitle = soup.select_one('.movie-title').text.strip()
movieLink = baseURL + soup.select_one('.movie-title a')['href']
movieFormat = soup.select_one('.movie-format').text.strip()
times = [item.text.strip() for item in soup.select('.line-items a')]
results.append([movieTitle, movieLink, link, address, movieFormat, times])
df = pd.DataFrame(results, columns = columns)
print(df)
示例结果: