如何使用BeautifulSoup 4解析具有特定ID的所有内容?

时间:2018-10-18 14:35:21

标签: python html beautifulsoup

基本上,我正在做一个Python脚本,该脚本返回从某个停靠点(如您在POST参数上看到的)从今天开始的所有火车发车时间的列表,但它只是返回最后一趟火车,用于出于某些原因。

当前代码:

import requests
from bs4 import BeautifulSoup
import datetime
import calendar


def get_todays_trains():
    now = datetime.datetime.now()

    url = 'https://www.cp.pt/sites/passageiros/en/train-times/Train-time-results'

    r = requests.post(url, allow_redirects=False, data={
        'arrival': 'Porto - Campanha',
        'depart': 'Aguas Santas - Palmilheira',
        'departDate': str(now.year) + '-' + str(now.month) + '-' + str(now.day),
        'Date': str(now.day) + ' ' + calendar.month_name[now.month] + ', ' + str(now.year)
    })

    html = r.text
    soup = BeautifulSoup(html, 'html.parser')

    for row in soup.findAll('tbody')[1].tbody.findAll('tr'):
        depart = row.findAll('td')[2]

    print(depart)
    print('departDate: ' + str(now.year) + '-' + str(now.month) + '-' + str(now.day))
    print('Date: ' + str(now.day) + ' ' + calendar.month_name[now.month] + ', ' + str(now.year))

    return depart


get_todays_trains()

如果您不想转到该页面,这是该页面上HTML的精简版:

https://pastebin.com/bfkAr6sH

1 个答案:

答案 0 :(得分:0)

如罗宾所说,您必须将临时值放入列表中并返回它们。我的建议是创建一个字典,其中包含所有值,例如出发日期和您需要的其他数据。就像

train_data = dict()
train_data['departing_date'] = str(now.year) + '-' + str(now.month) + '-' + str(now.day)
train_data['other_data'] = 'something you need'
train_data['departing_trains'] = []
for row in soup.findAll('tbody')[1].tbody.findAll('tr'):
    depart = row.findAll('td')[2]
    train_data['departing_trains'].append(depart)
return train_data

返回的字典将易于解析,并且也更具Python风格。

希望这会有所帮助!干杯!