如何使用请求

时间:2017-07-09 07:30:50

标签: python-3.x web-scraping python-requests

我想浏览从下拉列表中选择的网页,如下所示

 <h1>Scraping Test</h1>
    <form action="/tests/scraping" method='post'>
        <input type="hidden" name="csrf_token" value="1499585369##d2d1570f820aec0589b3bd5f4ab4e7df913e25ff"/>
        <table>
            <tr>
                <td>Select Ward: </td>
                <td>
                    <select name="ward">
                        <option value=''>-- select --</option>
                        <option value='DHANLAXMICOMPLEX'>DHANLAXMICOMPLEX</option>
                        <option value='POTALIYA'>POTALIYA</option>
                        <option value='ARJUNTOWER'>ARJUNTOWER</option>
                        <option value='NEWCLOTHMARKET'>NEWCLOTHMARKET</option>
                        <option value='CHANAKYAPURI'>CHANAKYAPURI</option>
                        <option value='BHAIKAKANAGAR'>BHAIKAKANAGAR</option>
                        <option value='RADHASWAMYROAD'>RADHASWAMYROAD</option>
                        <option value='SATADHAR'>SATADHAR</option>
                        <option value='AMRUTAVIDYALAYA'>AMRUTAVIDYALAYA</option>
                        <option value='AGARWALTOWERS'>AGARWALTOWERS</option>
                        <option value='RANNAPARK'>RANNAPARK</option>
                        <option value='IIM'>IIM</option>
                        <option value='VEJALPURWARD'>VEJALPURWARD</option>
                        <option value='GITAMANDIR'>GITAMANDIR</option>
                    </select>
                </td>
                <td><input type="submit" value="Search" class="search"/></td>
            </tr>
        </table>

如何从该下拉菜单中请求网页,还有一个搜索按钮 我的代码

import requests, csv
from lxml import html

def get_all_pages():

   payload = {'value':'DHANLAXMICOMPLEX'}
   url = requests.get('https://recruitment.advarisk.com/tests/scraping',data=payload)
   print(url.text)

1 个答案:

答案 0 :(得分:1)

您可以从此HTML元素

获取令牌值
<input type="hidden" name="csrf_token" value="1499585369##d2d1570f820aec0589b3bd5f4ab4e7df913e25ff"/>

并在您的请求中使用。尝试使用以下代码,如有任何问题请告诉我

import lxml.html
import requests

url = "https://recruitment.advarisk.com/tests/scraping"
s = requests.session()
r = s.get(url)
source = lxml.html.document_fromstring(r.content)
token = source.xpath('//input[@name="csrf_token"]/@value')[0]

headers = {'Referer': 'https://recruitment.advarisk.com/tests/scraping'}
data = {'csrf_token': token, 'ward': 'DHANLAXMICOMPLEX'}

print(s.post(url, data=data, headers=headers).text)