import requests
from bs4 import BeautifulSoup
r = requests.get("https://www.udacity.com/courses/all")
soup = BeautifulSoup(r.text)
summaries = soup.find_all("li", class_="") #using "card-list_catalogCardListItem__aUQtx" for class_ resulted in 0 case
print('Number of Courses:', len(summaries)) #this finds 225 case
summaries[7].select_one("li").get_text().strip() #output: 'AI for Business Leaders'
summaries[7].select_one("a").get_text().strip() #output:'Artificial Intelligence'
courses = []
for summary in summaries:
title = summary.select_one("a").get_text().strip()
school = summary.select_one("li").get_text().strip()
courses.append((title, school))
#to get all the summaries text extraction will result in "AttributeError: 'NoneType' object has no attribute 'get_text'"
出于教育目的,为了提取
1)所有优达学城课程 2)在什么学校 3)简短说明
我尝试使用“find_all
”来使用上述代码。我的手动搜索表明页面上有 264 门课程。我最初使用了 'find_all("li", class_="card-list_catalogCardListItem__aUQtx")
' 标签,结果为 0。当我将 class_
留空时,最接近的数字是 225,只是为了测试。但是,当我打算使用“for 循环”来提取所有课程时,这最终会导致 AttributeError
。这可能是因为并非所有找到的摘要都是可读的“'NoneType' object has no attribute 'get_text'
”。
我的问题:我怎样才能做到这一点? (因为 find_all
标签发现似乎失败)
答案 0 :(得分:1)
通过向以下地址发送 GET
请求来动态加载页面:
https://www.udacity.com/data/catalog.json?v=%223cd8649e%22
您可以向该链接发送请求以接收所有数据,您可以在其中以 Python 字典 (dict
) 的形式访问键/值:
import requests
url = "https://www.udacity.com/data/catalog.json?v=%223cd8649e%22"
response = requests.get(url).json()
for data in response:
course = data["payload"]
if "shortSummary" in course:
print("{:<50} {:<60} {:<50}".format(course["school"], course["title"], course["shortSummary"]))
输出(截断):
School of Data Science Data Engineer Data Engineering is the foundation for the new world of Big Data. Enroll now to build production-ready data infrastructure, an essential skill for advancing your data career.
School of Data Science Data Scientist Build effective machine learning models, run data pipelines, build recommendation systems, and deploy solutions to the cloud with industry-aligned projects.
School of Data Science Data Analyst Use Python, SQL, and statistics to uncover insights, communicate critical findings, and create data-driven solutions.
School of Data Science Programming for Data Science with Python Learn the fundamental programming tools for data professionals: Python, SQL, the Terminal and Git.
School of Autonomous Systems C++ Get hands-on experience by building five real-world projects.
School of Product Management Product Manager Envision and execute the development of industry-defining products, and learn how to successfully bring them to market.
使用 {:<50} {:<60} {:<50}
会将文本左对齐指定的数量。