从表中提取特定元素和美丽的汤

时间:2019-03-11 22:42:00

标签: python

我想从一张带有漂亮汤和要求的表中提取特定数据。这是表的一部分的样子:

<table>
<tbody><tr>
<th nowrap="">Name / Address</th>
<th nowrap="">Phone / Fax / Web</th>
<th>Education, Qualifications, and Certifications</th>
<th>Distance</th>
</tr>
<tr valign="top">
<td nowrap="">
<strong>
Katherine
Ames
Williams,
DPM
</strong><br>
Chapel Hill Foot &amp; Ankle Assocaiates<br>
1506 E. Franklin St. #104<br>
Chapel Hill, NC
27514<br>
<a href="http://maps.google.com/?q=Chapel%20Hill%20Foot%20%26%20Ankle%20Assocaiates%2C%201506%20E%2E%20Franklin%20St%2E%20%23104%2C%20Chapel%20Hill%2C%20NC%2027514" target="_blank">Map</a><br>
</td>
<td nowrap="">
(919) 960-8858 phone<br>
(919) 960-2882 fax<br>
<a href="http://www.chapelhillfootandankle.com" target="_blank">www.chapelhillfootandankle.com</a>
</td>
<td>
ABPM - CERT. IN POD. ORTHO &amp; MED.<br>
Des Moines University College of Podiatric Medicine and Surgery, formerly College of Podiatric Medicine and Surgery, University of Osteopathic Medicine &amp; Health Sciences<br>
2012
</td>
<td align="right">
4.1
</td>
</tr>
<tr valign="top" class="altrow">
<td nowrap="">
<strong>
Jane
Elizabeth
Andersen,
DPM
</strong><br>
Chapel Hill Foot &amp; Ankle Assoc.<br>
1506 E. Franklin St. #104<br>
Chapel Hill, NC
27514<br>
<a href="http://maps.google.com/?q=Chapel%20Hill%20Foot%20%26%20Ankle%20Assoc%2E%2C%201506%20E%2E%20Franklin%20St%2E%20%23104%2C%20Chapel%20Hill%2C%20NC%2027514" target="_blank">Map</a><br>
</td>
<td nowrap="">
(919) 960-8858 phone<br>
(919) 960-2882 fax<br>
<a href="http://www.chapelhillfootandankle.com" target="_blank">www.chapelhillfootandankle.com</a>
</td>
<td>
ABFAS - CERT. IN FOOT SURGERY<br>American Association For Women Podiatrists<br>
California School of Podiatric Medicine at Samuel Merritt University, formerly California College of Podiatric Medicine<br>
1993
</td>
<td align="right">
4.1
</td>
</tr>
</tbody></table>

我想获得的详细信息种类例如: 姓名:凯瑟琳·埃姆斯·威廉姆斯(DPM);业务:教堂山脚和脚踝协会,地址:1506 E. Franklin St.#104;电话:(919)960-8858;链接:www.chapelhillfootandankle.com。 我的代码当前为:

table = soup.find('table')
list_of_rows = []
for row in table.findAll('tr'):
    list_of_cells = []
    for cell in row.findAll(["th","td"]):
        text = cell.text
        list_of_cells.append(text)
    list_of_rows.append(list_of_cells)

for item in list_of_rows:
    print(' '.join(item))

它为我提供了表格的所有内容,但我不知道如何将细节分隔为所需的格式。我真的很感谢您的帮助。

0 个答案:

没有答案