Web报废页面源中没有的隐藏文本?

时间:2017-12-05 07:07:08

标签: javascript python html web-scraping beautifulsoup

我正在开发一个网络刮刀,以便从UDEMY课程中获得完整的课程。我在python中使用了美味的汤和要求。虽然,页面中的一些课程的最后部分已折叠,但我们必须单击以展开。如何提取整个课程?

网址:https://www.udemy.com/python-the-complete-python-developer-course/

from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as Soup

my_url = "https://www.udemy.com/python-the-complete-python-developer-course/"
head = {'User-Agent':'Mozilla/5.0'}
pagereq = Request(my_url, headers=head)

pager = urlopen(pagereq)

page = pager.read()
pager.close()
Sp = Soup(page, "html.parser")
Sections = Sp.findAll("div", {"class": "content-container"})
numlec = Sp.find("div", {"class": "num-lectures"})

for section in Sections:
    SecTitle = section.find("span", {"class": "lecture-title-text"}).text.strip()
    SecLen = section.find("span", {"class": "section-header-length"}).text.strip()
    lectures = section.findAll("div", {"class": "lecture-container"})
    print("-" * 40)
    print(SecTitle+"\t"+SecLen)
    print()
    for lecture in lectures:
        name = lecture.find("div", {"class": "title"}).text.strip()
        leng = lecture.find("span", {"class": "content-summary"}).text.strip()
        print("\t {}\t{}".format(name, leng))
    print("-" * 40)

这会刮掉所有数据直到折叠文本。但我想要完整的课程。有没有简单的方法呢?

1 个答案:

答案 0 :(得分:0)

试试这个。首先点击from selenium import webdriver import time driver = webdriver.Chrome() driver.get("https://www.udemy.com/python-the-complete-python-developer-course/") time.sleep(2) driver.find_element_by_css_selector(".content-container.js-load-more").click() for link in driver.find_elements_by_css_selector('.lecture-title-text'): link.click() time.sleep(2) for items in driver.find_elements_by_css_selector(".content-container"): title = items.find_element_by_css_selector(".lecture-title-text").text course_list = ' '.join([item.text for item in items.find_elements_by_css_selector(".title")]) print("Course_title: {}\nCurriculum: {}\n".format(title,course_list)) driver.quit() 按钮,然后点击每个加号按钮展开所有隐藏的项目,最后它将从该页面获取所有标题及其课程。

Course_title: Introduction
Curriculum: 

Course_title: Python Setup for Windows
Curriculum: Introduction Install Python on Windows IDLE On Windows with a cool demo app! Downloading and Installing IntelliJ (FREE and PAID versions) on Windows Free 90 Day Extended Trial of IntelliJ Ultimate Edition Now Available Move to next section!

Course_title: Python Setup for Mac
Curriculum: Introduction Downloading And Installing Python On Mac OS X IDLE on Mac OS X with a cool demo app! Downloading and Installing IntelliJ (FREE and PAID version) for a Mac Free 90 Day Extended Trial of IntelliJ Ultimate Edition Now Available Move to next section!

部分输出:

{{1}}