Question

我是Python的新手，我正在使用漂亮的汤来做关于网页抓取的任务。要求用户输入课程单元。我应该提取有关课程的相关信息，包括课程名称，时间，注册学生和讲师。

我开始寻找包含所有课程信息的课程表，每个课程都在课程表标签的表格中。然后我想迭代并遍历每个课程以找出信息。但是我写的代码并没有给我任何东西。

有人可以查看我的代码吗？哪个部分我做错了？先感谢您。 html链接为http://classes.usc.edu/term-20181/classes/itp/ 以下是我要求用户输入的代码，我正在尝试使用find＆amp; find_all函数用于查找班级标题，时间，学生注册和讲师。

from bs4 import BeautifulSoup
import urllib.request


url="http://classes.usc.edu/term-20181/classes/itp/"
page=urllib.request.urlopen(url)

soup=BeautifulSoup(page.read(),"html.parser")

# ask for user input for course units 
choiceUnits=input("Enter")
#trying to find the tag that contain all the courses information
coursesTable=soup.find("div",class_="course-table")  
#trying to find each course table under the course-table tag  
courses=coursesTable.find_all("div",class_="course-info expanded")

for course in courses:
    # trying to find the course units
    unitsTag=courses.find("span",class_="units")
    units=unitsTag.text
    #compare the course units with the user input. If they are the same, find out the course title,time,students registered and instruction 
    if units==choiceUnits:
        #find the title of the course
        titleTag=courses.find("a",class_="courselink")
        title=titleTag.text
        #find the time of the course
        timeTag=courses.find_all("td",class_="time")
        time=timeTag.text
        #find the number of students registered 
        registerTag=courses.find_all("td",class_="registered")
        register=registerTag.text
        #find the instructor 
        instructorTag=courses.find_all("td",class_="instructor")
        instructor=instructorTag.text
        #print out the result to verify 
        print(title)
        print(time,register,instructor)

Answer 1

您的代码中有一些内容无效：

在您点击课程之前，用于查找课程表course-info expanded的课程不存在，因此您必须使用course-info expandable。

其次，您要求用户输入单位数，但您要提取格式为units的{{1}}文字，因此您也需要考虑这一点。

最后，在(#.# units)中，您需要访问for loop对象的属性，而不是course。

这给出了你想要的输出：

courses

美丽的汤不会返回任何东西

1 个答案: