我正试图从Philly Police webpage的某个地点开出警区。我手动执行此操作的位置太多,因此我尝试使用Python的请求库自动执行该过程。保存位置值的网页表单如下:
<form id="search-form" method="post" action="districts/searchAddress">
<fieldset>
<div class="clearfix">
<label for="search-address-box"><span>Enter Your Street Address</span></label>
<div class="input">
<input tabindex="1" class="district-street-address-input" id="search-address-box" name="name" type="text" value="">
</div>
</div>
<div class="actions" style="float: left;">
<button tabindex="3" type="submit" class="btn btn-success">Search</button>
</div>
<a id="use-location" href="https://www.phillypolice.com/districts/index.html?_ID=7&_ClassName=DistrictsHomePage#" style="float: left; margin: 7px 0 0 12px;"><i class="icon-location-arrow"></i>Use Current Location</a>
<div id="current-location-display" style="display: none;"><p>Where I am right now.</p></div>
</fieldset>
</form>
但是,当我尝试使用以下内容发布或放入网页时:
r = requests.post('http://www.phillypolice.com/districts',data={'search-address-box':'425 E. Roosevelt Blvd'})
我收到错误405,不允许POST。然后我关闭了Javascript并尝试在网页上找到该区域,当我点击提交时,我收到了相同的405错误消息。因此,表单绝对不会提交,并且使用JavaScript找到该区域。
有没有办法模拟&#39;点击&#39;使用请求库触发JavaScript的提交按钮?
答案 0 :(得分:2)
首先查询谷歌地图到最终请求得到的坐标后,检索数据如下:
您可以使用bing maps api设置一个免费帐户,并获取获取请求所需的坐标:
(1..30).cover?(2..3)
=> false
如果我们减去我的密钥:
import requests
key = "my_key"
coord_params = {"output": "json",
"key": key}
# This provides the coordinates.
coords_url = "https://dev.virtualearth.net/REST/v1/Locations"
# Template to pass each address to in your actual loop.
template = "{add},US"
url = "https://api.phillypolice.com/jsonservice/Map/searchAddress.json"
with requests.Session() as s:
# Add the query param passing in each zipcode
coord_params["query"] = template.format(add="425 E. Roosevelt Blvd")
js = s.get(coords_url, params=coord_params).json()
# Parse latitude and longitude from the returned json.
# Call str to make make it into `(lat, lon)`
latitude_longitude = str((js[u'resourceSets'][0][u'resources'][0]["point"][u'coordinates']))
data = s.get(url, params={"latlng": latitude_longitude})
print(data.json())
如果您在浏览器中查看请求,则可以看到它与您看到的响应相匹配。
答案 1 :(得分:1)
点击“提交”时会发生两件大事 - 有谷歌地理编码服务的请求和对“searchAddress.json”端点的XHR请求,该端点使用返回的坐标地理编码服务。
您可以尝试模拟上述请求,仔细处理所有API密钥和所需参数,或者您可以通过selenium
保持更高级别并使用浏览器自动化。
使用PhantomJS
headless browser的工作示例:
In [2]: from selenium import webdriver
In [3]: driver = webdriver.PhantomJS()
In [4]: driver.get("https://www.phillypolice.com/districts/")
In [5]: address = "425 E. Roosevelt Blvd"
In [6]: search_box = driver.find_element_by_id("search-address-box")
In [7]: search_box.send_keys(address)
In [8]: search_box.submit()
In [9]: driver.find_element_by_css_selector("#district-menu h2").text
Out[9]: u'35th District'
In [10]: driver.find_element_by_css_selector("#district-menu h4").text
Out[10]: u'PSA 2'
并且,您可能需要Explicit Waits来处理“时间”问题。