使用BeatifulSoup Python刮取滚动网页时出现问题

时间:2020-01-29 18:06:55

标签: python html web-scraping beautifulsoup

当我尝试剪贴此page的信息时遇到问题。

我的代码在这里: '''

import requests
from bs4 import BeautifulSoup


request = requests.get("https://www.aiscore.com/basketball/20200128")

page = request.content
soup = BeautifulSoup(page, 'html.parser')
print(soup.prettify())

matchs = soup.findAll("div", {"class":"list"})

for match in matchs:

    hour = match.find("span", {"class":"fs-12 flex-1 text-center"})
    hour = hour.text

    status = match.find("div", {"class":"fs-12 color-999 flex-1 text-center"})
    status = status.text

    teams = match.findAll("div", {"class":"w-o-h"})
    i = 1
    for team in teams:
        if i == 1:
            t1 = team.text
        elif i == 2:
            t2 = team.text
        else:
            print("+ de 2 équipes dans le match")
        i += 1

    scores = match.findAll("div", {"class":"flex align-center justify-center fs-12 color-999 w-bar-100 flex-1"})
    i = 1
    for score in scores:
        scs_qtps = score.findAll("div", {"class":"flex-1 text-center isVisible"})
        if i == 1:
            k = 1
            for sc_qtp in scs_qtps:
                if k == 1:
                    sc_qt1_t1 = sc_qtp.text
                elif k == 2:
                    sc_qt2_t1 = sc_qtp.text
                elif k == 3:
                    sc_qt3_t1 = sc_qtp.text
                elif k == 4:
                    sc_qt4_t1 = sc_qtp.text
                else :
                    print("plus de 4 quart tps")
                k += 1
            sc_final_t1 = score.find("div", {"class":"flex-1 text-center"})
            sc_final_t1 = sc_final_t1.text
        elif i == 2:
            k = 1
            for sc_qtp in scs_qtps:
                if k == 1:
                    sc_qt1_t2 = sc_qtp.text
                elif k == 2:
                    sc_qt2_t2 = sc_qtp.text
                elif k == 3:
                    sc_qt3_t2 = sc_qtp.text
                elif k == 4:
                    sc_qt4_t2 = sc_qtp.text
                else :
                    print("plus de 4 quart tps")
                k += 1
            sc_final_t2 = score.find("div", {"class":"flex-1 text-center"})
            sc_final_t2 = sc_final_t2.text
        i += 1

    odds = match.findAll("div", {"style":"height: 19px; line-height: 19px; color: rgb(102, 102, 102);"})
    i = 1
    for odd in odds:
        if i == 1:
            odd_t1 = odd.text
        elif i == 2:
            odd_t2 = odd.text
        i += 1

    print(hour, status, t1, t2)
    print(sc_qt1_t1, sc_qt2_t1, sc_qt3_t1, sc_qt4_t1, "%t", sc_final_t1)
    print(sc_qt1_t2, sc_qt2_t2, sc_qt3_t2, sc_qt4_t2, "%t", sc_final_t2)
    print("1 :", odd_t1, "; 2 :", odd_t2)

'''

我想抓取所有分数,但是有一个问题:我无法访问html页面中的所有数据。实际上,我要抓取的所有信息都位于该div中:

<div class="vue-recycle-scroller scroller page-mode direction-vertical"
/div>

但是当我使用print(soup.prettify())打印html页面时,除div之外,此!-- --中没有任何内容。

所以我的问题是:我如何访问此div中“位于”的信息? 我愿意接受所有答案(也许我应该使用Selenium来删除此类信息吗?)

非常感谢!

对不起,我的基本英语

1 个答案:

答案 0 :(得分:0)

由于我们看不到您的脚本,所以无法分辨,但我想您可能需要再次检查您的类名。如果要抓取任何脚本,则最好使用ctrl + shift + C,然后将鼠标悬停在要抓取的脚本上,这将帮助您获得正确的类。希望这会有所帮助。