发送带有参数的 post 请求时脚本卡住

时间:2021-01-31 16:57:01

标签: python json python-3.x web-scraping python-requests

我正在尝试使用来自网页的适当参数来填充 json 响应,并发出 post http 请求。当我运行脚本时,我看到脚本卡住了并且没有带来任何结果。它也不会抛出任何错误。这是site link。在点击 Get times & tickets 按钮之前,我从该网站 this form 的三个下拉菜单中选择了三个选项。

我尝试过:

import requests
from bs4 import BeautifulSoup

url = 'https://www.thetrainline.com/'
link = 'https://www.thetrainline.com/api/journey-search/'

payload = {"passengers":[{"dateOfBirth":"1991-01-31"}],"isEurope":False,"cards":[],"transitDefinitions":[{"direction":"outward","origin":"1f06fc66ccd7ea92ae4b0a550e4ddfd1","destination":"7c25e933fd14386745a7f49423969308","journeyDate":{"type":"departAfter","time":"2021-02-11T22:45:00"}}],"type":"single","maximumJourneys":4,"includeRealtime":True,"applyFareDiscounts":True}

with requests.Session() as s:
    s.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.104 Safari/537.36'
    s.headers['content-type'] = 'application/json'
    s.headers['accept'] = 'application/json'
    r = s.post(link,json=payload)
    print(r.status_code)
    print(r.json())
<块引用>

如何获得 json 响应,发出带有来自该站点的参数的 post 请求?

2 个答案:

答案 0 :(得分:2)

您缺少必需的标头:x-versionrefererreferer 标头指的是搜索表单,您可以构建它。在 journey-search 之前,您必须发布 availability 请求。

import requests
from requests.models import PreparedRequest

headers = {
    'authority': 'www.thetrainline.com',
    'pragma': 'no-cache',
    'cache-control': 'no-cache',
    'x-version': '2.0.18186',
    'dnt': '1',
    'accept-language': 'en-GB',
    'sec-ch-ua-mobile': '?0',
    'user-agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 11_1_0) AppleWebKit/537.36 (KHTML, like Gecko) '
                  'Chrome/88.0.4324.96 Safari/537.36',
    'content-type': 'application/json',
    'accept': 'application/json',
    'origin': 'https://www.thetrainline.com',
    'sec-fetch-site': 'same-origin',
    'sec-fetch-mode': 'cors',
    'sec-fetch-dest': 'empty',
}

with requests.Session() as s:
    origin = "6e2242b3f38bbbd8d8124e1d84d319e1"
    destination = "15bcf02bc44ea754837c8cf14569f608"
    localDateTime = "2021-02-03T19:30:00"
    dateOfBirth = "1991-02-03"
    passenger_type = "single"

    req = PreparedRequest()
    url = "http://www.neo4j.com"
    params = {
        "origin": origin,
        "destination": destination,
        "outwardDate": localDateTime,
        "outwardDateType": "departAfter",
        "journeySearchType": passenger_type,
        "passengers[]": dateOfBirth
    }
    req.prepare_url("https://www.thetrainline.com/book/results", params)

    headers.update({"referer": req.url})
    s.headers = headers

    payload_availability = {
        "origin": origin,
        "destination": destination,
        "outwardDefinition": {
            "localDateTime": localDateTime,
            "searchMethod": "DEPARTAFTER"
        },
        "passengerBirthDates": [{
            "id": "PASSENGER-0",
            "dateOfBirth": dateOfBirth
        }],
        "maximumNumberOfJourneys": 4,
        "discountCards": []
    }
    r = s.post('https://www.thetrainline.com/api/coaches/availability', json=payload_availability)
    r.raise_for_status()

    payload_search = {
        "passengers": [{"dateOfBirth": "1991-02-03"}],
        "isEurope": False,
        "cards": [],
        "transitDefinitions": [{
            "direction": "outward",
            "origin": origin,
            "destination": destination,
            "journeyDate": {
                "type": "departAfter",
                "time": localDateTime}
        }],
        "type": passenger_type,
        "maximumJourneys": 4,
        "includeRealtime": True,
        "applyFareDiscounts": True
    }
    r = s.post('https://www.thetrainline.com/api/journey-search/', json=payload_search)
    r.raise_for_status()

    print(r.json())

答案 1 :(得分:0)

在 Sers 的回复中,缺少标题。

在抓取网站时,您必须牢记反抓取机制。该网站会考虑您的 IP 地址、请求标头、cookie 和各种其他因素来阻止您的请求。