Question

来自

url = 'https://www.example.com/MOVxxxx/YYYY-MM-DD'

我在页面上使用带有范围的循环（xxx = ID MOV）＆amp; （日期）

for page1 in mov_pages:
    for page2 in date_pages:
        response = get('https://www.example.com + page1 + '/' + page2)
        page_html = BeautifulSoup(response.text, 'html.parser')
        containers_1 = page_html.find_all('ul', class_='showtime-lists')
        containers_2 = page_html.find_all('div', class_='day')

像这样的html结构：

<ul class="showtime-lists">
    <li>...</li>
        <format="2D" movie="3247" time="12.00"
    <li>...</li>
        <format="2D" movie="3247" time="13.30"
    <li>...</li>
        ...

和另一个结构（同一页面）

<div class="day">
    <a href date="2017-10-15">..</a>
<div class="day">
    <a href date="2017-10-15">..</a>
...

我的目的是从列表中创建数据框，格式如下：

date        time    movie
2017-10-14  12.00   3247
2017-10-14  13.30   3247
2017-10-14  12.00   3252
...
2017-10-15
2017-10-15
...     ... ...

问题是：

* the structure given different lengths

我最好的试用期：

* I can create the df with time&movie correctly but not the date (because the date didnt have the same length)

我的代码：

    #extract movie & time was similar way
    movie = []  
    for container in containers_1:
            idx = container['movie']
            movie.append(idx)

    #extract date
    date_id = []
        for each in containers_2:
        date_idx = each.a['date']
        date_id.append(date_idx)

输出：

movie&time had same length but not with date

BeautifulSoup - 不同的结果长度

0 个答案: