使用请求[python]正确构建XHR请求

时间:2016-08-01 17:18:14

标签: javascript python

我试图对通过Javascript生成数据的网站进行网页抓取。我已经在这里做了足够的阅读,现在知道刮掉这些的方法是:

  1. 观看Firebug中的网络标签,了解您提出请求时的情况
  2. 隔离XHR请求并在脚本中重新创建它们。
  3. 因此,当我执行1时,会向此屏幕截图中显示的链接发送POST请求: enter image description here 你也可以看到它得到的回应。看起来很棒,对吗?

    但是当我尝试重新创建该请求时响应,我在Firebug的Post选项卡下看到的有效负载,在Python中如下:

    import requests
    from bs4 import BeautifulSoup
    
    payload = {"Max":999,"RectCoord":"89,-179,-89,179","Source":"","SortField":"NEWID()","OfficeName":"","FirstName"
    :"","LastName":"da","CityName":"","ZipCode":"","Category":"S","SecLanguageReq":"","OfficeCode":""}
    
    r = requests.post('http://search.cnyrealtor.com/MyAjaxService.asmx/MemberSearch', data=payload)
    
    print(r.content)
    

    我收到一个显示错误消息的页面: Request format is unrecognized for URL unexpectedly ending in \'/MemberSearch\'

    所以,我的问题是 - 当Firebug中的响应正常时,为什么我得到了响应?我在Python脚本的requests.post(url)行中遗漏了什么吗?

1 个答案:

答案 0 :(得分:1)

您需要将字典转储为JSON并作为有效负载发送。设置Content-Type请求标头也很重要:

import json
import requests

payload = {"Max": 999, "RectCoord": "89,-179,-89,179", "Source": "", "SortField": "NEWID()", "OfficeName": "",
           "FirstName": "", "LastName": "", "CityName": "", "ZipCode": "", "Category": "S", "SecLanguageReq": "",
           "OfficeCode": ""}

with requests.Session() as session:
    session.get("http://search.cnyrealtor.com/SiteContent/SYR/MemberSearchSYR.aspx")
    r = session.post('http://search.cnyrealtor.com/MyAjaxService.asmx/MemberSearch', data=json.dumps(payload),
                     headers={"Content-Type": "application/json; charset=UTF-8"})

    print(r.content)