我是一个python新手。我试图从我的学校网站获取一些数据。下面是我写的只废弃新闻项目的代码。它有效但我希望标题,日期和段落在新的行中。我觉得我的代码中缺少一些东西,但我不会挂在它上面。需要你的帮助。
from bs4 import BeautifulSoup
from urllib.request import urlopen
page = urlopen("http://www.kibabiiuniversity.ac.ke")
soup = BeautifulSoup(page)
for i in soup.findAll("div", {"class": "blog-thumbnail-inside"}):
print (i.get_text())
print ("----------" *20)
这是我试图抓取的页面的html标签结构。
<div class="blog-thumbnail-inside">
<h2 class="blog-thumbnail-title post-widget-title-color gdl-title">
<a href="http://www.kibabiiuniversity.ac.ke">
Completion of fees & collection of exam cards.
</a>
</h2>
<div class="blog-thumbnail-info post-widget-info-color gdl-divider">
<div class="blog-thumbnail-date">Posted on 09 Jan 2017</div>
</div>
<div class="blog-thumbnail-context">
<div class="blog-thumbnail-content">
Download the information on fee payment and collection of exam cards..
</div>
</div>
</div>
答案 0 :(得分:0)
for i in soup.findAll("div", {"class": "blog-thumbnail-inside"}):
print (i.get_text('\n')) #You can specify a string to be used to join the bits of text together
print ("----------" *20)
出:
Final Undergraduate Examination Timetable for Semester 1 2016/2017
Posted on 11 Jan 2017
Download Undergraduate Timetable
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Vacancies for Administrative and Teaching Positions
Posted on 11 Jan 2017
Kibabii University is a fully fledged public institution of higher education and research in Kenya with a student population of 6400 and staff population of 346. The University seeks to appoint innovative individuals with experience and excellent credentials
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------