我正在开发一个网络刮刀,以便从UDEMY课程中获得完整的课程。我在python中使用了美味的汤和要求。虽然,页面中的一些课程的最后部分已折叠,但我们必须单击以展开。如何提取整个课程?
网址:https://www.udemy.com/python-the-complete-python-developer-course/
from urllib.request import Request, urlopen
from bs4 import BeautifulSoup as Soup
my_url = "https://www.udemy.com/python-the-complete-python-developer-course/"
head = {'User-Agent':'Mozilla/5.0'}
pagereq = Request(my_url, headers=head)
pager = urlopen(pagereq)
page = pager.read()
pager.close()
Sp = Soup(page, "html.parser")
Sections = Sp.findAll("div", {"class": "content-container"})
numlec = Sp.find("div", {"class": "num-lectures"})
for section in Sections:
SecTitle = section.find("span", {"class": "lecture-title-text"}).text.strip()
SecLen = section.find("span", {"class": "section-header-length"}).text.strip()
lectures = section.findAll("div", {"class": "lecture-container"})
print("-" * 40)
print(SecTitle+"\t"+SecLen)
print()
for lecture in lectures:
name = lecture.find("div", {"class": "title"}).text.strip()
leng = lecture.find("span", {"class": "content-summary"}).text.strip()
print("\t {}\t{}".format(name, leng))
print("-" * 40)
这会刮掉所有数据直到折叠文本。但我想要完整的课程。有没有简单的方法呢?
答案 0 :(得分:0)
试试这个。首先点击from selenium import webdriver
import time
driver = webdriver.Chrome()
driver.get("https://www.udemy.com/python-the-complete-python-developer-course/")
time.sleep(2)
driver.find_element_by_css_selector(".content-container.js-load-more").click()
for link in driver.find_elements_by_css_selector('.lecture-title-text'):
link.click()
time.sleep(2)
for items in driver.find_elements_by_css_selector(".content-container"):
title = items.find_element_by_css_selector(".lecture-title-text").text
course_list = ' '.join([item.text for item in items.find_elements_by_css_selector(".title")])
print("Course_title: {}\nCurriculum: {}\n".format(title,course_list))
driver.quit()
按钮,然后点击每个加号按钮展开所有隐藏的项目,最后它将从该页面获取所有标题及其课程。
Course_title: Introduction
Curriculum:
Course_title: Python Setup for Windows
Curriculum: Introduction Install Python on Windows IDLE On Windows with a cool demo app! Downloading and Installing IntelliJ (FREE and PAID versions) on Windows Free 90 Day Extended Trial of IntelliJ Ultimate Edition Now Available Move to next section!
Course_title: Python Setup for Mac
Curriculum: Introduction Downloading And Installing Python On Mac OS X IDLE on Mac OS X with a cool demo app! Downloading and Installing IntelliJ (FREE and PAID version) for a Mac Free 90 Day Extended Trial of IntelliJ Ultimate Edition Now Available Move to next section!
部分输出:
{{1}}