BeautifulSoup - 不同的结果长度

时间:2017-10-16 15:52:21

标签: python list web-scraping beautifulsoup

来自

url = 'https://www.example.com/MOVxxxx/YYYY-MM-DD'

我在页面上使用带有范围的循环(xxx = ID MOV)& (日期)

for page1 in mov_pages:
    for page2 in date_pages:
        response = get('https://www.example.com + page1 + '/' + page2)
        page_html = BeautifulSoup(response.text, 'html.parser')
        containers_1 = page_html.find_all('ul', class_='showtime-lists')
        containers_2 = page_html.find_all('div', class_='day')

像这样的html结构:

<ul class="showtime-lists">
    <li>...</li>
        <format="2D" movie="3247" time="12.00"
    <li>...</li>
        <format="2D" movie="3247" time="13.30"
    <li>...</li>
        ...

和另一个结构(同一页面)

<div class="day">
    <a href date="2017-10-15">..</a>
<div class="day">
    <a href date="2017-10-15">..</a>
...

我的目的是从列表中创建数据框,格式如下:

date        time    movie
2017-10-14  12.00   3247
2017-10-14  13.30   3247
2017-10-14  12.00   3252
...
2017-10-15
2017-10-15
...     ... ...
问题是:

* the structure given different lengths

我最好的试用期:

* I can create the df with time&movie correctly but not the date (because the date didnt have the same length)

我的代码:

    #extract movie & time was similar way
    movie = []  
    for container in containers_1:
            idx = container['movie']
            movie.append(idx)

    #extract date
    date_id = []
        for each in containers_2:
        date_idx = each.a['date']
        date_id.append(date_idx)

输出:

movie&time had same length but not with date

0 个答案:

没有答案