Python3请求库提交不允许发布请求的表单

时间:2016-10-08 01:24:11

标签: javascript forms python-3.x web-scraping python-requests

我正试图从Philly Police webpage的某个地点开出警区。我手动执行此操作的位置太多,因此我尝试使用Python的请求库自动执行该过程。保存位置值的网页表单如下:

<form id="search-form" method="post" action="districts/searchAddress">
<fieldset>
    <div class="clearfix">
        <label for="search-address-box"><span>Enter Your Street Address</span></label>
        <div class="input">
            <input tabindex="1" class="district-street-address-input" id="search-address-box" name="name" type="text" value="">
        </div>
    </div>
    <div class="actions" style="float: left;">
        <button tabindex="3" type="submit" class="btn btn-success">Search</button>
    </div>
    <a id="use-location" href="https://www.phillypolice.com/districts/index.html?_ID=7&_ClassName=DistrictsHomePage#" style="float: left; margin: 7px 0 0 12px;"><i class="icon-location-arrow"></i>Use Current Location</a>
    <div id="current-location-display" style="display: none;"><p>Where I am right now.</p></div>
</fieldset>
</form>

但是,当我尝试使用以下内容发布或放入网页时:

r = requests.post('http://www.phillypolice.com/districts',data={'search-address-box':'425 E. Roosevelt Blvd'})

我收到错误405,不允许POST。然后我关闭了Javascript并尝试在网页上找到该区域,当我点击提交时,我收到了相同的405错误消息。因此,表单绝对不会提交,并且使用JavaScript找到该区域。

有没有办法模拟&#39;点击&#39;使用请求库触发JavaScript的提交按钮?

2 个答案:

答案 0 :(得分:2)

首先查询谷歌地图到最终请求得到的坐标后,检索数据如下:

enter image description here

您可以使用bing maps api设置一个免费帐户,并获取获取请求所需的坐标:

(1..30).cover?(2..3)
=> false

如果我们减去我的密钥:

import requests

key = "my_key"
coord_params = {"output": "json",
                "key": key}

# This provides the coordinates.
coords_url = "https://dev.virtualearth.net/REST/v1/Locations"

# Template to pass each address to in your actual loop.
template = "{add},US"
url = "https://api.phillypolice.com/jsonservice/Map/searchAddress.json"
with requests.Session() as s:
    # Add the query param passing in each zipcode
    coord_params["query"] = template.format(add="425 E. Roosevelt Blvd")
    js = s.get(coords_url, params=coord_params).json()
    # Parse latitude and longitude from the returned json.
    # Call str to make make it into `(lat, lon)`
    latitude_longitude = str((js[u'resourceSets'][0][u'resources'][0]["point"][u'coordinates']))
    data = s.get(url, params={"latlng": latitude_longitude})

    print(data.json())

如果您在浏览器中查看请求,则可以看到它与您看到的响应相匹配。

答案 1 :(得分:1)

点击“提交”时会发生两件大事 - 有谷歌地理编码服务的请求和对“searchAddress.json”端点的XHR请求,该端点使用返回的坐标地理编码服务。

您可以尝试模拟上述请求,仔细处理所有API密钥和所需参数,或者您可以通过selenium保持更高级别并使用浏览器自动化。

使用PhantomJS headless browser的工作示例:

In [2]: from selenium import webdriver

In [3]: driver = webdriver.PhantomJS()

In [4]: driver.get("https://www.phillypolice.com/districts/")

In [5]: address = "425 E. Roosevelt Blvd"

In [6]: search_box = driver.find_element_by_id("search-address-box")

In [7]: search_box.send_keys(address)

In [8]: search_box.submit()

In [9]: driver.find_element_by_css_selector("#district-menu h2").text
Out[9]: u'35th District'

In [10]: driver.find_element_by_css_selector("#district-menu h4").text
Out[10]: u'PSA 2'

并且,您可能需要Explicit Waits来处理“时间”问题。