Selenium v​​s.纽约大都会歌剧院

时间:2018-03-05 21:40:11

标签: python selenium web-scraping

首先,必须提前道歉 - 这里几乎是新手,这是我的第一个问题;请善待......

我正在努力刮掉javascript生成的页面;特别是那些大都会歌剧院的时间表。对于任何给定的月份,我想创建一个仅包含作品名称和演出日期和时间的日历。我把美丽的汤和硒扔在上面,我可以获得关于作曲家的爱情生活等方面的大量信息 - 但不是这3个元素。任何帮助将不胜感激。

Link to a random month in their schedule

1 个答案:

答案 0 :(得分:1)

您应该在网站上寻找(将来)的一件事是调用API。我打开了Chrome开发工具(F12)并在网络标签页面重新加载了该页面。

我发现了两个api电话,一个用于"制作"和#34;事件"。 "事件"响应有更多的信息。下面的代码调用"事件"端点然后返回该数据的子集(具体来说,根据您的描述,标题,日期和时间)。

我不确定你想要对这些数据做什么,所以我只是将它打印出来。如果代码需要更新/修改,请告诉我,我会尽力帮助您!

我使用Python 3.6.4

编写了这段代码
from datetime import datetime

import requests

BASE_URL = 'http://www.metopera.org/api/v1/calendar'
EVENT = """\
Title: {title}
Date:  {date}
Time:  {time}
---------------\
"""


def get_events(*, month, year):
    params = {
        'month': month,
        'year': year
    }
    r = requests.get('{}/events'.format(BASE_URL), params=params)
    r.raise_for_status()
    return r.json()


def get_name_date_time(*, events):
    result = []
    for event in events:
        d = datetime.strptime(event['eventDateTime'], '%Y-%m-%dT%H:%M:%S')
        result.append({
            'title': event['title'],
            'date': d.strftime('%A, %B %d, %Y'),
            'time': d.strftime('%I:%M %p')
        })
    return result


if __name__ == '__main__':
    events = get_events(month=11, year=2018)
    names_dates_times = get_name_date_time(events=events)

    for event in names_dates_times:
        print(EVENT.format(**event))

控制台:

Title: Tosca
Date:  Friday, November 02, 2018
Time:  08:00 PM
---------------
Title: Carmen
Date:  Saturday, November 03, 2018
Time:  01:00 PM
---------------
Title: Marnie
Date:  Saturday, November 03, 2018
Time:  08:00 PM
---------------
Title: Tosca
Date:  Monday, November 05, 2018
Time:  08:00 PM
---------------
Title: Carmen
Date:  Tuesday, November 06, 2018
Time:  07:30 PM
---------------
Title: Marnie
Date:  Wednesday, November 07, 2018
Time:  07:30 PM
---------------
Title: Mefistofele
Date:  Thursday, November 08, 2018
Time:  07:30 PM
---------------
Title: Tosca
Date:  Friday, November 09, 2018
Time:  08:00 PM
---------------
Title: Marnie
Date:  Saturday, November 10, 2018
Time:  01:00 PM
---------------
Title: Carmen
Date:  Saturday, November 10, 2018
Time:  08:00 PM
---------------
Title: Mefistofele
Date:  Monday, November 12, 2018
Time:  07:30 PM
---------------
Title: Tosca
Date:  Tuesday, November 13, 2018
Time:  07:30 PM
---------------
Title: Les Pêcheurs de Perles  (The Pearl Fishers)
Date:  Wednesday, November 14, 2018
Time:  07:30 PM
---------------
Title: Carmen
Date:  Thursday, November 15, 2018
Time:  07:30 PM
---------------
Title: Mefistofele
Date:  Friday, November 16, 2018
Time:  07:30 PM
---------------
Title: Tosca
Date:  Saturday, November 17, 2018
Time:  01:00 PM
---------------
Title: Les Pêcheurs de Perles  (The Pearl Fishers)
Date:  Saturday, November 17, 2018
Time:  08:00 PM
---------------
Title: Mefistofele
Date:  Monday, November 19, 2018
Time:  07:30 PM
---------------
Title: Les Pêcheurs de Perles  (The Pearl Fishers)
Date:  Tuesday, November 20, 2018
Time:  08:00 PM
---------------
Title: Il Trittico
Date:  Friday, November 23, 2018
Time:  07:30 PM
---------------
Title: Les Pêcheurs de Perles  (The Pearl Fishers)
Date:  Saturday, November 24, 2018
Time:  01:00 PM
---------------
Title: Mefistofele
Date:  Saturday, November 24, 2018
Time:  08:00 PM
---------------
Title: Il Trittico
Date:  Monday, November 26, 2018
Time:  07:30 PM
---------------
Title: Mefistofele
Date:  Tuesday, November 27, 2018
Time:  07:30 PM
---------------
Title: Les Pêcheurs de Perles  (The Pearl Fishers)
Date:  Wednesday, November 28, 2018
Time:  07:30 PM
---------------
Title: La Bohème
Date:  Thursday, November 29, 2018
Time:  07:30 PM
---------------
Title: Il Trittico
Date:  Friday, November 30, 2018
Time:  07:30 PM
---------------

供参考,here is a link to the full JSON response from the events endpoint.您可能需要更多可能有趣的信息,但我只是抓住了您在说明中要求的内容。