从thetrainline.com上搜索车票和票价的数据

时间:2017-03-21 15:56:59

标签: python python-3.x web-scraping beautifulsoup python-requests

Hello其他程序员。我正在尝试运行python代码来从thetrainline.com中删除信息。 我最近开始编程,我似乎无法弄清楚如何从post请求中提取数据。请参阅下面的信息。

以下是我现在的代码:

postURL = 'https://www.thetrainline.com/buytickets/'
predata = {'OriginStation':'Stockport',
'DestinationStation':'Birmingham New Street',
'RouteRestriction':'NULL',
'ViaAvoidStation':'',
'journeyTypeGroup':'return',
'outwardDate':'14-Apr-17',
'OutwardLeaveAfterOrBefore':'A',
'OutwardHour':'15',
'OutwardMinute':'15',
'returnDate':'16-Apr-17',
'InwardLeaveAfterOrBefore':'A',
'ReturnHour':'9',
'ReturnMinute':'0',
'AdultsTravelling':'1',
'ChildrenTravelling':'0',
'railCardsType_0':'YNG',
'railCardNumber_0':'1',
'ExtendedSearch':'Get times & tickets'}

headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36'}
import requests
postform=requests.post(postURL,headers=headers, data=predata)

from bs4 import BeautifulSoup
soup=BeautifulSoup(postform.content,'html.parser')
table=soup.find(id='timetable')

如果我在命令行管理程序中运行命令“table”,我会得到以下内容:

>>> table
<form action="combinedmatrix.aspx" class="form matrix matrix-search-outdep matrix-search-returndep" data-defaults='{"adultPassengers":1,"canChangeJourney":true,"canPreselectTicket":true,"childPassengers":0,"destinationName":"Birmingham New Street","fullJourneys":[{"cheapestTickets":[{"label":"Cheapest Standard Single","tickets":[{"code":"MBS","departureTime":"15:16","groupIdentifier":"cheapest","isCheapest":true,"journeyId":1,"price":"9.30","value":"{\"ArrivalStationCode\":\"BHM\",\"DepartureStationCode\":\"SPT\",\"Id\":208,\"JourneyArrivalDate\":\"\\\/Date(1492183980000+0100)\\\/\",\"JourneyDepartureDate\":\"\\\/Date(1492179360000+0100)\\\/\",\"Price\":9.3,\"PriceInPounds\":\"£9.30\",\"Type\":2}"},{"code":"MBS","departureTime":"15:36","groupIdentifier":"cheapest","isCheapest":true,"journeyId":2,"price":"9.30","value":"{\"ArrivalStationCode\":\"BHM\",\"DepartureStationCode\":\"SPT\",\"Id\":208,\"JourneyArrivalDate\":\"\\\/Date(1492185480000+0100)\\\/\",\"JourneyDepartureDate\":\"\\\/Date(1492180560000+0100)\\\/\",\"Price\":9.3,\"PriceInPounds\":\"£9.30\",\"Type\":2}"},{"code":"SVS","departureTime":"15:40","groupIdentifier":"cheapest","journeyId":3,"price":"23.85","value":"{\"ArrivalStationCode\":\"BHM\",\"DepartureStationCode\":\"SPT\",\"Id\":253,\"JourneyArrivalDate\":\"\\\/Date(1492186680000+0100)\\\/\",\"JourneyDepartureDate\":\"\\\/Date(1492180800000+0100)\\\/\",\"Price\":23.85,\"PriceInPounds\":\"£23.85\",\"Type\":2}"},{"code":"MBS","departureTime":"16:16","groupIdentifier":"cheapest","isCheapest":true,"journeyId":4,"price":"9.30","value":"{\"ArrivalStationCode\":\"BHM\",\"DepartureStationCode\":\"SPT\",\"Id\":208,\"JourneyArrivalDate\":\"\\\/Date(1492187580000+0100)\\\/\",\"JourneyDepartureDate\":\"\\\/Date(1492182960000+0100)\\\/\",\"Price\":9.3,\"PriceInPounds\":\"£9.30\",\"Type\":2}"}],"ticketsType":"S"},{"label":"Cheapest First Class Single","tickets":[{"code":"MBF","departureTime":"15:16","groupIdentifier":"cheapest","journeyId":1,"price":"24.30","value":"{\"ArrivalStationCode\":\"BHM\",\"DepartureStationCode\":\"SPT\",\"Id\":210,\"JourneyArrivalDate\":\"\\\/Date(1492183980000+0100)\\\/\",\"JourneyDepartureDate\":\"\\\/Date(1492179360000+0100)\\\/\",\"Price\":24.3,\"PriceInPounds\":\"£24.30\",\"Type\":2}"},{"code":"MBF","departureTime":"15:36","groupIdentifier":"cheapest","journeyId":2,"price":"24.30","value":"{\"ArrivalStationCode\":\"BHM\",\"DepartureStationCode\":\"SPT\",\"Id\":210,\"JourneyArrivalDate\":\"\\\/Date(1492185480000+0100)\\\/\",\"JourneyDepartureDate\":\"\\\/Date(1492180560000+0100)\\\/\",\"Price\":24.3,\"PriceInPounds\":\"£24.30\",\"Type\":2}"},
...

您如何建议从POST请求中获取数据集?

非常感谢您的帮助

1 个答案:

答案 0 :(得分:0)

In [8]: import json

In [9]: json.loads(table.get('data-defaults'))