使用Python的请求发送ASP.net POST

时间:2014-07-26 22:06:39

标签: python asp.net web-scraping python-requests

我正在使用Python的请求模块抓取一个旧的ASP.net网站。

我花了5个多小时试图弄清楚如何模拟这个POST请求无济于事。按照我在下面的方式进行操作,我基本上会收到一条消息,说“没有项目符合此项目参考。”

任何帮助都将深表感谢 - 这是请求和我的代码,一些事情是出于简洁和/或隐私而修改的:

我自己的代码:

import requests

# Scraping the item number from the website, I have confirmed this is working.

#Then use the newly acquired item number to request the data.
item_url = http://www.example.com/EN/items/Pages/yourrates.aspx?vr= + item_number[0]
viewstate = r'/wEPD...' # Truncated for brevity.

# Create the appropriate request and payload.
payload = {"vr": int(item_number[0])}

item_request_body = {
        "__SPSCEditMenu": "true",
        "MSOWebPartPage_PostbackSource": "",
        "MSOTlPn_SelectedWpId": "",
        "MSOTlPn_View": 0,
        "MSOTlPn_ShowSettings": "False",
        "MSOGallery_SelectedLibrary": "",
        "MSOGallery_FilterString": "",
        "MSOTlPn_Button": "none",
        "__EVENTTARGET": "",
        "__EVENTARGUMENT": "",
        "MSOAuthoringConsole_FormContext": "",
        "MSOAC_EditDuringWorkflow": "",
        "MSOSPWebPartManager_DisplayModeName": "Browse",
        "MSOWebPartPage_Shared": "",
        "MSOLayout_LayoutChanges": "",
        "MSOLayout_InDesignMode": "",
        "MSOSPWebPartManager_OldDisplayModeName": "Browse",
        "MSOSPWebPartManager_StartWebPartEditingName": "false",
        "__VIEWSTATE": viewstate,
        "keywords": "Search our site",
        "__CALLBACKID": "ctl00$SPWebPartManager1$g_dbb9e9c7_fe1d_46df_8789_99a6c9db4b22",
        "__CALLBACKPARAM": "startvr"
    }

# Write the appropriate headers for the property information.
item_request_headers = {
    "Host": home_site,
    "Connection": "keep-alive",
    "Content-Length": len(encoded_valuation_request),
    "Cache-Control": "max-age=0",
    "Origin": home_site,
    "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36",
    "Content-Type": "application/x-www-form-urlencoded; charset=UTF-8",
    "Cookie": "__utma=48409910.1174413745.1405662151.1406402487.1406407024.17; __utmb=48409910.7.10.1406407024; __utmc=48409910; __utmz=48409910.1406178827.13.3.utmcsr=ratesandvallandingpage|utmccn=landingpages|utmcmd=button",
    "Accept": "*/*",
    "Referer": valuation_url,
    "Accept-Encoding": "gzip,deflate,sdch",
    "Accept-Language": "en-US,en;q=0.8"
}

    response = requests.post(url=item_url, params=payload, data=item_request_body, headers=item_request_headers)
    print response.text

Chrome告诉我的请求如下:

Remote Address:202.55.96.131:80
Request URL:http://www.example.com/EN/items/Pages/yourrates.aspx?vr=123456789
Request Method:POST
Status Code:200 OK

Request Headers
Accept:*/*
Accept-Encoding:gzip,deflate,sdch
Accept-Language:en-US,en;q=0.8
Cache-Control:max-age=0
Connection:keep-alive
Content-Length:21501
Content-Type:application/x-www-form-urlencoded; charset=UTF-8
Cookie:__utma=48409910.1174413745.1405662151.1406402487.1406407024.17; __utmb=48409910.7.10.1406407024; __utmc=48409910; __utmz=48409910.1406178827.13.3.utmcsr=ratesandvallandingpage|utmccn=landingpages|utmcmd=button
Host:www.site.com
Origin:www.site.com
Referer:http://www.example.com/EN/items/Pages/yourrates.aspx?vr=123456789
User-Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36

Query String Parameters
vr:123456789

Form Data
__SPSCEditMenu:true
MSOWebPartPage_PostbackSource:
MSOTlPn_SelectedWpId:
MSOTlPn_View:0
MSOTlPn_ShowSettings:False
MSOGallery_SelectedLibrary:
MSOGallery_FilterString:
MSOTlPn_Button:none
__EVENTTARGET:
__EVENTARGUMENT:
MSOAuthoringConsole_FormContext:
MSOAC_EditDuringWorkflow:
MSOSPWebPartManager_DisplayModeName:Browse
MSOWebPartPage_Shared:
MSOLayout_LayoutChanges:
MSOLayout_InDesignMode:
MSOSPWebPartManager_OldDisplayModeName:Browse
MSOSPWebPartManager_StartWebPartEditingName:false
__VIEWSTATE:/wEPD...(Omitted for length)
keywords:Search our site
__CALLBACKID:ctl00$SPWebPartManager1$g_dbb9e9c7_fe1d_46df_8789_99a6c9db4b22
__CALLBACKPARAM:startvr

1 个答案:

答案 0 :(得分:11)

您有太多请求参数,并且应设置内容类型,内容长度,主机,来源或连接标头; 将这些内容留给requests设置

你也在加倍url参数;要么手动将vr参数添加到网址,要么使用params,而不要同时使用valuation_url

POST主体中的某些参数很可能是由与会话关联的ASP应用程序生成的。我使用Session object __CALLBACKID的GET请求,解析该页面中的表单以提取item_request_headers = { "User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_9_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1985.125 Safari/537.36", "Accept": "*/*", "Accept-Encoding": "gzip,deflate,sdch", "Accept-Language": "en-US,en;q=0.8" } payload = {"vr": int(item_number[0])} session = requests.Session(headers=item_request_headers) # Get form page form_response = session.get(validation_url, params=payload) # parse form page; BeautifulSoup could do this for example soup = BeautifulSoup(form_response.content) callbackid = soup.select('input[name=__CALLBACKID]')[0]['value'] item_request_body = { "__SPSCEditMenu": "true", "MSOWebPartPage_PostbackSource": "", "MSOTlPn_SelectedWpId": "", "MSOTlPn_View": 0, "MSOTlPn_ShowSettings": "False", "MSOGallery_SelectedLibrary": "", "MSOGallery_FilterString": "", "MSOTlPn_Button": "none", "__EVENTTARGET": "", "__EVENTARGUMENT": "", "MSOAuthoringConsole_FormContext": "", "MSOAC_EditDuringWorkflow": "", "MSOSPWebPartManager_DisplayModeName": "Browse", "MSOWebPartPage_Shared": "", "MSOLayout_LayoutChanges": "", "MSOLayout_InDesignMode": "", "MSOSPWebPartManager_OldDisplayModeName": "Browse", "MSOSPWebPartManager_StartWebPartEditingName": "false", "__VIEWSTATE": viewstate, "keywords": "Search our site", "__CALLBACKID": callbackid, "__CALLBACKPARAM": "startvr" } item_url = 'http://www.example.com/EN/items/Pages/yourrates.aspx' response = session.post(url=item_url, params=payload, data=item_request_body, headers={'Referer': form_response.url}) 参数。请求会话将存储服务器设置的任何cookie并重用它们:

{{1}}

会话处理标题(设置用户代理,并接受参数),只有在会话的POST上我们也添加了一个引用标题。