如何在scrapy中执行POST方法?

时间:2016-12-08 14:08:29

标签: python-2.7 web-scraping scrapy screen-scraping scrapy-spider

请一位提供以下网址的帖子方法。

https://www.mygofer.com/furniture/b-34790/rowCount_120?keyword=south%20shore%20furniture

1)上面的URL加载,它提供POST URL和下面的formdata

发布网址= https://www.mygofer.com/lps-mygofer/api/v1/mygofer/search

formdata = {"过滤器":{}," brandFilter":空," sellersFilter":空," catgroupId":" 34790&# 34;," LEVELONE":空,"搜索模式" :" BROWSE"" sortBy":"推荐""关键字":"南%20shore%20furniture&#34 ;, "页次":1," rowCount时" 120" ffmMode" :" ALL"" priceFilter":空," hideOOS":真," UNO":" 4848&#34 ;, "会话" {" GUID":0," EMAILID":""" sessionKey":&# 34; da9d76bd-bd4e-11e6-8e27-00505699251d" "用户id":6026228," APPID":" MYGOFER"}"安全" {" SRC&#34 ;: "网络"" TS":" 2016-12-08T14:01:57.619Z""的authToken" :""}}

2)我已经通过了帖子和网址。 FormRequest中的formdata,但我得到任何回复。

public class SimpleWebProxy : IWebProxy
{
    public ICredentials Credentials { get; set; }

    public Uri GetProxy(Uri destination)
    {
        return destination;
    }

    public bool IsBypassed(Uri host)
    {
        // if return true, service will be very slow.
        return false;
    }

    private static SimpleWebProxy defaultProxy = new SimpleWebProxy();
    public static SimpleWebProxy Default
    {
        get
        {
            return defaultProxy;
        }
    }
}

var client = new RestClient();
client.Proxy = SimpleWebProxy.Default;

2 个答案:

答案 0 :(得分:2)

一些事情:

  • 您要发送的正文已经是JSON编码的,因此您希望使用body参数,而不是formdata(用于URL /编码的键/值对)
  • 您需要指明您的HTTP请求正文的Content-Type(我的Chrome浏览器正在发送Content-Type: application/json;charset=UTF-8
  • 显然,"null"值与网站效果不佳,请使用null

示例shell会话:

$ scrapy shell -s USER_AGENT="Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.75 Safari/537.36" 'https://www.mygofer.com/furniture/b-34790/rowCount_120?keyword=south%20shore%20furniture'
(...)
>>> frq = scrapy.FormRequest("https://www.mygofer.com/lps-mygofer/api/v1/mygofer/search",
...     method="POST",
...     body='''{"filters":{},
... "brandFilter":null,
... "sellersFilter":null,
... "catgroupId":"34790",
... "levelOne":null,
... "searchMode":"BROWSE",
... "sortBy":"RECOMMENDED",
... "keyword":"south%20shore%20furniture",
... "pageNum":"1",
... "rowCount":"120",
... "ffmMode":"ALL",
... "priceFilter":null,
... "hideOOS":"true",
... "uNo":"4848",
... "session":{"guid":"0",
...           "emailId":"",
...           "sessionKey":"fcd3bcd1-b7bf-11e6-8e27-00505699251d",
...           "userId":"5970776",
...           "appId":"MYGOFER"},
... "security":{"src":"web",
...            "ts":"2016-12-01T12:58:28.994Z",
...            "authToken":""}}''',
...     headers={"Content-Type": "application/json;charset=UTF-8",
...     "Accept":"application/json, text/plain, */*"})
>>> fetch(frq)
2016-12-08 15:50:26 [scrapy] DEBUG: Crawled (200) <POST https://www.mygofer.com/lps-mygofer/api/v1/mygofer/search> (referer: None)
>>> 
>>>
>>> import json
>>> data = json.loads(response.text)
>>> len(data)
3
>>> data.keys()
[u'classType', u'payload', u'userRole']
>>> 
>>> from pprint import pprint
>>> 
>>> pprint(data)
{u'classType': u'com.shc.ecom.local.search.beans.output.SearchOutput',
 u'payload': {u'feature': {},
              u'filters': {u'levelThree': [{u'catGpId': u'28371',
                                            u'catGpPath': u'For the Home_Kids Room_Fun Accessories',
                                            u'count': 1,
                                            u'name': u'Fun Accessories',
                                            u'parentLevel': u'Kids Room',
                                            u'seoPath': u'for-the-home-kids-room-fun-accessories'},
...
                                           {u'catGpId': u'1231474854',
                                            u'catGpPath': u'TVs & Electronics_Media Furniture_TV Stands',
                                            u'count': 69,
                                            u'name': u'TV Stands',
                                            u'parentLevel': u'Media Furniture',
                                            u'seoPath': u'tvs-electronics-media-furniture-tv-stands'}],
                           u'narrowBy': [{u'count': 8,
                                          u'name': u'Double Sided',
                                          u'value': u'Yes'},
                                         {u'count': 4,
                                          u'name': u'Upholstered',
                                          u'value': u'No'},
                                         {u'count': 24,
                                          u'name': u'Mobile',
                                          u'value': u'Yes'},
                                         {u'count': 24,
                                          u'name': u'Fire Resistant',
                                          u'value': u'No'}],
                           u'otherFilters': {u'Assembly': {u'Assembled': 2,
                                                           u'Ready to assemble': 770},
                                             u'Audience': {u'Adult': 262,
                                                           u'All ages': 7,
                                                           u'Dorm/College': 2,
                                                           u'Kids': 351,
                                                           u'Teen': 12},
...
                                             u'Width Range (in.)': {u'12 - 24 in.': 8,
                                                                    u'25 - 36 in.': 106,
                                                                    u'37 - 48 in.': 32,
                                                                    u'49 - 60 in.': 70,
                                                                    u'61 - 72 in.': 4,
                                                                    u'Less than 12 in.': 2}},
                           u'priceRanges': [{u'cnt': u'262',
                                             u'high': u'100',
                                             u'low': u'0'},
                                            {u'cnt': u'269',
                                             u'high': u'150',
                                             u'low': u'100'},
                                            {u'cnt': u'251',
                                             u'high': u'200',
                                             u'low': u'150'},
                                            {u'cnt': u'219',
                                             u'high': u'275',
                                             u'low': u'200'},
                                            {u'cnt': u'94',
                                             u'high': u'above',
                                             u'low': u'275'}]},
              u'keyword': u'south%20shore%20furniture',
              u'levelOne': {u'catGpId': u'34790',
                            u'catGpPath': u'Furniture',
                            u'name': u'Furniture',
                            u'seoPath': u'furniture'},
              u'maxPrice': u'2539.19',
              u'minPrice': u'12.65',
              u'numFound': u'1095',
              u'products': [{u'availFFMs': [u'SHIP'],
                             u'brand': u'South Shore',
                             u'ffm': u'VD',
                             u'freeShip': u'0',
                             u'img': u'http://c.shld.net/rpx/i/s/pi/mp/20571/prod_6578221517?src=http%3A%2F%2Fak1.ostkcdn.com%2Fimages%2Fproducts%2F9810550%2FSouth-Shore-Willow-Twin-Bookcase-Headboard-39-Sumptuous-Cherry-0da3d88a-cb6a-4048-80d4-be464e85da49.jpg&d=8d8fee1e07dc750e2fb7c5711a500bf32278595c',
                             u'isInCart': False,
                             u'itemPartNumber': u'SPM9120228717',
                             u'mailable': u'1',
                             u'mfpartno': u'3356098-9810550',
                             u'name': u'South Shore Willow Twin Bookcase Headboard  Sumptuous Cherry',
                             u'partNumber': u'SPM9120228717',
                             u'prdType': u'NONVARIATION',
                             u'price': {u'mapViolation': False,
                                        u'pid': u'SPM9120228717'},
                             u'qtyInCart': 0,
                             u'rating': 0.0,
                             u'reviews': 0,
                             u'salePrice': 87.11,
                             u'shipStock': u'1',
                             u'soldBy': u'Overstock.com',
                             u'solrSalePrice': 87.11,
                             u'storePrice': False,
                             u'type': u'NONVARIATION'},
...
                            {u'availFFMs': [u'SHIP'],
                             u'brand': u'South Shore',
                             u'ffm': u'VD',
                             u'freeShip': u'1',
                             u'img': u'http://c.shld.net/rpx/i/s/i/spin/image/spin_prod_204451401',
                             u'isInCart': False,
                             u'itemPartNumber': u'00827455000',
                             u'mailable': u'1',
                             u'mfpartno': u'7250767',
                             u'name': u'Axess Collection 4-Shelf Bookcasen Pure White',
                             u'partNumber': u'00827455000P',
                             u'prdType': u'NONVARIATION',
                             u'price': {u'clearancePrice': u'0.00',
                                        u'mapViolation': False,
                                        u'pid': u'00827455000',
                                        u'priceType': u'P',
                                        u'promoPrice': u'67.49',
                                        u'regularPrice': u'74.99',
                                        u'salePrice': u'67.49',
                                        u'savings': u'7.5'},
                             u'qtyInCart': 0,
                             u'rating': 0.0,
                             u'reviews': 0,
                             u'salePrice': 67.49,
                             u'shipStock': u'1',
                             u'soldBy': u'Sears',
                             u'solrSalePrice': 59.71,
                             u'storePrice': False,
                             u'type': u'NONVARIATION'}],
              u'query': u'http://solrx416p.prod.ch4.s.com:8380/search/select?qt=simpleallsubcat&q=south%20shore%20furniture&wt=json&start=0&rows=120&fq=catalogs:("27151")&fq=level1Cats:("27151_Furniture")&fq=storeAttributes:(!"10175_OUTOFSTOCK_INDICATOR=1")&fq=!(storeAttributes:("10175_DEFAULT_FULFILLMENT=DDC" OR "10175_DEFAULT_FULFILLMENT=KRES" OR "10175_DEFAULT_FULFILLMENT=CRES" OR "10175_DEFAULT_FULFILLMENT=DRES" OR "10175_DEFAULT_FULFILLMENT=SRES" OR "10175_DEFAULT_FULFILLMENT=PLSFS"))&sort=instock desc,fulfillment desc,imageStatus desc,score desc&clientID=MyGofer&sortPrefix=4848~10175&globalPrefix=4848,10175',
              u'relevancyRedirect': False,
              u'status': u'success',
              u'twItems': []},
 u'userRole': None}
>>> 

答案 1 :(得分:0)

这是 5 年后,但您可以使用 scrapy.http.JsonRequest 处理 JSON 有效负载 - 文档中的示例:

data = {
'name1': 'value1',
'name2': 'value2',
}

yield JsonRequest(url='http://www.example.com/post/action', data=data)