来自第三方API的500内部服务器错误

时间:2018-07-05 20:54:46

标签: json python-3.x api web-scraping scrapy

Python 3.6-Scrapy 1.5

我正在刮擦John Deere保修页面,以查看所有新的PMP及其到期日期。在浏览器和网页之间的网络通信内部,我发现了一个REST API,可在网页中提供数据。

现在,我正在尝试从API获取json数据,而不是抓取javascript页面的内容。但是,出现内部服务器错误,我也不知道为什么。

我正在使用scrapy登录并捕获数据。

import scrapy

class PmpSpider(scrapy.Spider):
    name = 'pmp'
    start_urls = ['https://jdwarrantysystem.deere.com/portal/']

    def parse(self, response):

        self.log('***Form Request***')
        login ={
            'USERNAME':*******,
            'PASSWORD':*******
            }
        yield scrapy.FormRequest.from_response(
            response,
            url = 'https://registration.deere.com/servlet/com.deere.u90950.registrationlogin.view.servlets.SignInServlet',
            method = 'POST', formdata = login, callback = self.parse_pmp
        )
        self.log('***PARSE LOGIN***')

    def parse_pmp(self, response):
        self.log('***PARSE PMP***')
        cookies = response.headers.getlist('Set-Cookie')
        for cookie in cookies:
            cookie = cookie.decode('utf-8')
            self.log(cookie)
            cook = cookie.split(';')[0].split('=')[1]
            path = cookie.split(';')[1].split('=')[1]
            domain = cookie.split(';')[2].split('=')[1]
        yield scrapy.Request(
            url = 'https://jdwarrantysystem.deere.com/api/pip-products/collection',
            method = 'POST',
            cookies = {
                'SESSION':cook,
                'path':path,
                'domain':domain
            },
            headers = {
            "Accept":"application/json",
            "accounts":["201445","201264","201167","201342","201341","201221"],
            "excludedPin":"",
            "export":"",
            "language":"",
            "metric":"Y",
            "pipFilter":"OPEN",
            "pipType":["MALF","SAFT"]
            },
            meta = {'dont_redirect': True},
            callback = self.parse_pmp_list
        )

    def parse_pmp_list(self, response):
        self.log('***LISTA PMP***')
        self.log(response.body)

为什么会出现错误?如何从此API获取数据?

2018-07-05 17:26:19 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST https://jdwarrantysystem.deere.com/api/pip-products/collection> (failed 1 times): 500 Internal Server Error
2018-07-05 17:26:20 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <POST https://jdwarrantysystem.deere.com/api/pip-products/collection> (failed 2 times): 500 Internal Server Error
2018-07-05 17:26:21 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <POST https://jdwarrantysystem.deere.com/api/pip-products/collection> (failed 3 times): 500 Internal Server Error
2018-07-05 17:26:21 [scrapy.core.engine] DEBUG: Crawled (500) <POST https://jdwarrantysystem.deere.com/api/pip-products/collection> (referer: https://jdwarrantysystem.deere.com/portal/)
2018-07-05 17:26:21 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <500 https://jdwarrantysystem.deere.com/api/pip-products/collection>: HTTP status code is not handled or not allowed

Headers

Param

Request Headers

Example

1 个答案:

答案 0 :(得分:0)

我发现了问题:这是一个POST请求,必须具有json格式的主体数据,因为与GET请求不同,参数不在URI中。请求标头也需要Imports System.Data.SqlClient Imports System.Data.OleDb Public Class Form1 Dim myDA As OleDbDataAdapter Dim myDataSet As DataSet Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click Dim con As OleDbConnection = New OleDbConnection("Provider=Microsoft.jet.oledb.4.0;data source=C:\Users\Ryan\Desktop\Coding\Microsoft Access\Powerful Access Files\Nwind.mdb") Dim cmd As OleDbCommand = New OleDbCommand("SELECT * FROM Customers", con) con.Open() myDA = New OleDbDataAdapter(cmd) 'Automatically generates DeleteCommand,UpdateCommand and InsertCommand for DataAdapter object Dim builder As OleDbCommandBuilder = New OleDbCommandBuilder(myDA) myDataSet = New DataSet() myDA.Fill(myDataSet, "MyTable") DataGridView2.DataSource = myDataSet.Tables("MyTable").DefaultView con.Close() con = Nothing End Sub Private Sub Button2_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button2.Click Me.Validate() Me.myDA.Update(Me.myDataSet.Tables("MyTable")) End Sub End Class 。请参阅:How parameters are sent in POST requestRest POST in python。因此,编辑函数parse_pmp:

"content-type": "application/json"

一切正常!