带POST请求的刮板不会带来任何结果

时间:2017-05-15 09:05:40

标签: python web-scraping web-crawler

使用post请求创建一个scraper,当我运行它时,我什么都没得到。无法理解我做错了什么。当我看到表单数据查询时使用chrome开发人员工具,似乎如果我在控制台中粘贴了总字符串,那么它看起来很奇怪,这就是为什么我试图简化它。任何建议都将受到高度赞赏。

import requests
from lxml import html

url = "http://www.golf.co.nz/PlayGolf/ClubDirectory.aspx"

def grab_data(address):
    payload={"ctl00$MainContent$cbRegion":"All Regions","ctl00$MainContent$cbHoleOpt":"Any number of Holes"}
    headers={'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.81 Safari/537.36'}
    response = requests.post(address, data=payload, headers = headers)
    tree=html.fromstring(response.text)
    names=tree.xpath("//td[@class='align-left']//h2/text()")
    for name in names:
        print(name)

grab_data(url)

formdata元素:

<div id="MainContent_pnlForm" onkeypress="javascript:return WebForm_FireDefaultButton(event, 'MainContent_btnSearch')">


                                <div id="MainContent_pnlFilters">

                                        <div>
                                            <select name="ctl00$MainContent$cbRegion" onchange="javascript:setTimeout('__doPostBack(\'ctl00$MainContent$cbRegion\',\'\')', 0)" id="MainContent_cbRegion" class="ddl-filter">
            <option selected="selected" value="0">All Regions</option>
            <option value="1">Aorangi Region</option>
            <option value="2">Auckland Region</option>
            <option value="3">Bay of Plenty Region</option>
            <option value="5">Canterbury Region</option>
            <option value="6">Hawkes Bay Region</option>
            <option value="7">Manawatu/Wanganui Region</option>
            <option value="10">North Harbour Region</option>
            <option value="11">Northland Region</option>
            <option value="12">Otago Region</option>
            <option value="13">Poverty Bay/E. Coast Region</option>
            <option value="14">Southland Region</option>
            <option value="15">Taranaki Region</option>
            <option value="9">Tasman Region</option>
            <option value="16">Waikato Region</option>
            <option value="17">Wellington Region</option>

        </select> 

                                            <select name="ctl00$MainContent$cbHoleOpt" onchange="javascript:setTimeout('__doPostBack(\'ctl00$MainContent$cbHoleOpt\',\'\')', 0)" id="MainContent_cbHoleOpt" class="ddl-filter">
            <option selected="selected" value="0">Any number of Holes</option>
            <option value="9">9 Holes</option>
            <option value="18">18 Holes</option>
            <option value="27">27 Holes</option>
            <option value="36">36 Holes</option>

        </select>


                                            <input name="ctl00$MainContent$tbSearch" type="text" value="SEARCH" size="10" id="tbSearch">

                                            <input type="submit" name="ctl00$MainContent$btnSearch" value="GO" id="MainContent_btnSearch" class="module-button-cta submit-filter">
                                         </div>

    </div>

</div>

1 个答案:

答案 0 :(得分:0)

找到解决方案。 formdata参数需要设置为chrome开发人员工具显示的内容。我试图在这里发布我的答案,但不能因为完全满意的formdata跨越75000个字符,而body允许30000.