Python:BeautifulSoup Scrape,课程的空白说明

时间:2018-11-26 23:51:50

标签: python web-scraping beautifulsoup


# -*- coding: utf-8 -*-
Created on Mon Nov  5 20:37:33 2018

@author: DazedFury
# Here, we're just importing both Beautiful Soup and the Requests library
from bs4 import BeautifulSoup
import requests

# returns a CloudflareScraper instance
#scraper = cfscrape.create_scraper()  

#URL and textfile
text_file = open("Output.txt", "w", encoding='UTF-8')
page_link = ''
page_response = requests.get(page_link)
page_content = BeautifulSoup(page_response.content, "html.parser")

#Array for storing URL's
URLArray = []

#Find links
for link in page_content.find_all('a'):
    if('/university-course-descriptions/undergraduate' in link.get('href')):
k = 1

#Parse Loop        
while(k != 242):
    print("Writing " + str(k))

    completeURL = '' + URLArray[k]  

    # this is the url that we've already determined is safe and legal to scrape from.
    page_link = completeURL

    # here, we fetch the content from the url, using the requests library
    page_response = requests.get(page_link)

    #we use the html parser to parse the url content and store it in a variable.
    page_content = BeautifulSoup(page_response.content, "html.parser")

    #Find and print all text with tag p
    paragraphs = page_content.find_all('div', {'class' : 'course_codetitle'})
    paragraphs2 = page_content.find_all('div', {'class' : 'courseblockdesc'})
    j = 0
    for i in range(len(paragraphs)):
        if i % 2 == 0:
            if j < len(paragraphs2):
                text_file.write(" ".join(paragraphs2[j].get_text().split()))
                if(paragraphs2[j].get_text() != ""):
                    j += 1

    k += 1

#text_file.write("<p style=\"page-break-after: always;\">&nbsp;</p>")

#Close Text File




我考虑过只检查课程描述是否为空白,但是在网站上,如果课程没有描述,则不存在“ courseblockdesc”标记。因此,当我找到find_all courseblockdesc时,该列表实际上并没有向数组添加添加元素,因此顺序最终混乱了。有太多错误无法手动修复,因此我希望有人可以帮助我找到解决方案。

2 个答案:

答案 0 :(得分:1)


for block in page_content.find_all('div', class_="courseblock"):
    title = block.find('div', {'class' : 'course_codetitle'})
    description = block.find('div', {'class' : 'courseblockdesc'})
    #  do what you need with the navigable strings here.
    if description:

答案 1 :(得分:1)


from bs4 import BeautifulSoup
import requests

url = ""

with open("out.txt", "w", encoding="UTF-8") as f:
    for link in BeautifulSoup(requests.get(url).content, "html.parser").find_all("a"):
        if "/university-course-descriptions/undergraduate" in link["href"]:
            soup = BeautifulSoup(requests.get("" + link["href"]).content, "html.parser")

            for course in soup.find_all("div", {"class": "courseblock"}):
                title = course.find("div", {"class" : "course_title"}).get_text().strip()

                    desc = course.find("div", {"class" : "courseblockdesc"}).get_text().strip()
                except AttributeError:
                    desc = "No description available"

                f.write(title + "\n" + desc + "\n\n")


No description available

WLED 495B: Field Experience for World Languages Teacher Preparation in Grades 1-5
WL ED 495B Field Experience for World Languages Teacher Preparation in Grades 1-5 (3) Practicum situation where Prospective World Language teachers will demonstrate acquired knowledge on second language learning/teaching and educational theories. Prospective World Language teachers will have assigned school placements and will attend a weekly seminar where issues in World Language learning and teaching will be discussed. At their assigned school placement, prospective World Language teachers will have many opportunities to observe/work with children in grades 1-5 (1) focusing on second language learning/teaching and the socio/cultural issues associated to classroom practices while implementing and self-evaluated own designed activities and lessons; (2) weekly seminars will engage students in reflective activities that will enable them to analyze each week's events; (3) inquiry projects on teaching and learning of World Languages.

WLED 495C: Field Experience for World Languages Teacher Preparation in Grades 6-12
WL ED 495C Field Experience for World Languages Teacher Preparation in Grades 6-12 (3) Practicum situation where prospective World Language teachers will demonstrate acquired knowledge on second language learning/teaching and educational theories. Prospective World Language teachers will have assigned school placements in grades 6-12 and will attend a weekly seminar where issues in World Language learning and teaching will be discussed. At their assigned school placement, prospective World Language teachers will have many opportunities to observe/work with students in grades 6-12 (1) focusing on second language learning/teaching and the socio/cultural issues associated to classroom practices while implementing and self-evaluating their own designed activities and lessons, (2) weekly seminars will engage students in reflective activities that will enable them to analyze each week's events, and (3) inquiry projects on teaching and learning of World Languages.


  • 最好将with关键字用于文件I / O。完成后,这将自动关闭文件句柄。

  • 详细的中间变量和注释会增加噪音,例如:

# Here, we're just importing both Beautiful Soup and the Requests library
from bs4 import BeautifulSoup

#Close Text File
