当你的表没有类或属性值时,有没有推荐的方法在python中使用BeautifulSoup 4?
我正在考虑使用Get_Text()来转储文本,但是如果我想选择单个值或将表拆分为更离散的部分,我将如何处理它?</ p>
<table cellpadding="0" cellspacing="0" id="programmeDescriptor" width="100%">
<tr>
<td>
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th colspan="1">
Awards
</th>
</tr>
<tr>
</tr>
<tr>
<td>
Ordinary Bachelor Degree
</td>
</tr>
</table>
<table border="0" cellpadding="0" cellspacing="0" width="100%">
<tr>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
Programme Code:
</th>
<td width="150">
CodeValue
</td>
</tr>
</table>
</td>
<td width="5">
</td>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
Mode of Delivery:
</th>
<td width="150">
Full Time
</td>
</tr>
</table>
</td>
<td width="5">
</td>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
No. of Semesters:
</th>
<td width="150">
6
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
NFQ Level:
</th>
<td width="150">
7
</td>
</tr>
</table>
</td>
</tr>
<tr>
<td>
<table cellpadding="5" cellspacing="0" class="borders">
<tr>
<th width="160">
Embedded Award:
</th>
<td width="150">
No
</td>
</tr>
</table>
</td>
</tr>
</table>
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th width="160">
Department:
</th>
<td>
Computing
</td>
</tr>
</table>
<div class="pageBreak">
</div>
<h3>
Programme Outcomes
</h3>
<p class="info">
On successful completion of this programme the learner will be able to :
</p>
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<th width="30">
PO1
</th>
<td class="head" colspan="2">
Knowledge - Breadth
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</tr>
<tr>
<th width="30">
PO2
</th>
<td class="head" colspan="2">
Knowledge - Kind
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO3
</th>
<td class="head" colspan="2">
Skill - Range
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO4
</th>
<td class="head" colspan="2">
Skill - Selectivity
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO5
</th>
<td class="head" colspan="2">
Competence - Context
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<tdSome block of text </td>
</tr>
<tr>
<th width="30">
PO6
</th>
<td class="head" colspan="2">
Competence - Role
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO7
</th>
<td class="head" colspan="2">
Competence - Learning to Learn
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• Some block of text
</td>
</tr>
<tr>
<th width="30">
PO8
</th>
<td class="head" colspan="2">
Competence - Insight
</td>
</tr>
<tr>
<td class="head" width="30">
</td>
<td class="head" width="30">
(a)
</td>
<td>
• The graduate will demonstrate the ability to specify, design and build an IT system or research & report on a current IT topic
</td>
</tr>
</table>
<div class="pageBreak">
</div>
<h3>
Semester Schedules
</h3>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 1 / Semester 1
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td
<a href="index.cfm/page/module/moduleId/3897" target="_blank">
Web & User Experience
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3881" target="_blank">
Software Development 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1645" target="_blank">
Computer Architecture
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2328" target="_blank">
Discrete Mathematics 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3848" target="_blank">
Business & Information Systems
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2054" target="_blank">
Learning to Learn at Third Level
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 1 / Semester 2
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3886" target="_blank">
Software Development 2
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3895" target="_blank">
Object Oriented Systems Analysis
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3875" target="_blank">
Database Fundamentals
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3874" target="_blank">
Operating Systems Fundamentals
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2330" target="_blank">
Statistics
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2527" target="_blank">
Social Media Communications
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<div class="pageBreak">
</div>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 2 / Semester 1
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3877" target="_blank">
Web & Mobile Design & Development
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3876" target="_blank">
Database Design And Programming
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3869" target="_blank">
Software Development 3
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3873" target="_blank">
Software Quality Assurance and Testing
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3629" target="_blank">
Networking 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2477" target="_blank">
Discrete Mathematics 2
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 2 / Semester 2
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3862" target="_blank">
Project
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3911" target="_blank">
Object Oriented Analysis & Design 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3877" target="_blank">
Web & Mobile Design & Development
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3630" target="_blank">
Networking 2
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3870" target="_blank">
Software Development 4
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2476" target="_blank">
Management Science
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<div class="pageBreak">
</div>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 3 / Semester 1
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3911" target="_blank">
Object Oriented Analysis & Design 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3899" target="_blank">
Operating Systems
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1721" target="_blank">
Cloud Services & Distributed Computing
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2580" target="_blank">
Innovation & Entrepreneurship
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3878" target="_blank">
Web Application Development
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1689" target="_blank">
Algorithms and Data Structures 1
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2025" target="_blank">
Logic and Problem Solving
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3896" target="_blank">
Advanced Databases
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
<table cellpadding="0" cellspacing="0" width="100%">
<tr>
<td colspan="2">
<h4>
Stage 3 / Semester 2
</h4>
</td>
</tr>
<tr>
<td colspan="2">
<table cellpadding="5" cellspacing="0" class="borders" width="100%">
<tr>
<td class="head" colspan="2">
Mandatory
</td>
</tr>
<tr>
<th width="50">
Module Code
</th>
<th>
Module Title
</th>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2465" target="_blank">
Project
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1728" target="_blank">
Algorithms and Data Structures 2
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1675" target="_blank">
Network Management
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2025" target="_blank">
Logic and Problem Solving
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/3899" target="_blank">
Operating Systems
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/2580" target="_blank">
Innovation & Entrepreneurship
</a>
</td>
</tr>
<tr>
<td>
Code
</td>
<td>
<a href="index.cfm/page/module/moduleId/1679" target="_blank">
Object Oriented Analysis & Design 2
</a>
</td>
</tr>
</table>
</td>
</tr>
</table>
</td>
</tr>
</table>
答案 0 :(得分:1)
您可以迭代某些标签。我不知道你想做什么,但如果你想获得每个<th>
标签的文本,那么只需迭代它们,然后使用get_text()
答案 1 :(得分:1)
首先,所有表的父表都有一个id属性 - 让它成为搜索的基础:
super_table = soup.find("table", id="programmeDescriptor")
然后,根据您在评论中提到的内容,它看起来像您可以通过它的标题来区分每个内部表。实现此逻辑的一个选项是找到标头,然后使用find_parent()
查找父表:
def get_table_by_header_name(super_table, header):
return super_table.find("th", text=header).find_parent("table")
用法:
desired_table = get_table_by_header_name(super_table, "Awards")