以下是我的方案,我想在td
标记中获取tr
子标记和内容。我能够获取内容而不是标签,因为里面有太多元素。
回报应该是:
p
标记及其内容table
元素HTML:
<table>
<tr>
<td>
<!-- first element -->
<p> MY TEXT </p>
<!-- end element -->
</td>
<td>
<!-- second element -->
<table>
<tbody>
<tr>
<td>
<p> MY TEXT </p>
</td>
<td>
<p> MY TEXT </p>
</td>
</tr>
<tr>
<td>
<p> MY TEXT </p>
</td>
</tr>
</tbody>
</table>
<!-- end element -->
</td>
</tr>
</table>
答案 0 :(得分:1)
代码:
from bs4 import BeautifulSoup
html = '''
<table>
<tr>
<td>
<!-- first element -->
<p> MY TEXT </p>
<!-- end element -->
</td>
<td>
<!-- second element -->
<table>
<tbody>
<tr>
<td>
<p> MY TEXT </p>
</td>
<td>
<p> MY TEXT </p>
</td>
</tr>
<tr>
<td>
<p> MY TEXT </p>
</td>
</tr>
</tbody>
</table>
<!-- end element -->
</td>
</tr>
</table>
'''
soup = BeautifulSoup(html, 'html.parser')
print("The <p> tag with it's content:")
print(soup.find_all('p'))
print("\nThe <table> element:")
print(soup.find('table').prettify())
输出:
The <p> tag with it's content:
[<p> MY TEXT </p>, <p> MY TEXT </p>, <p> MY TEXT </p>, <p> MY TEXT </p>]
The <table> element:
<table>
<tr>
<td>
<!-- first element -->
<p>
MY TEXT
</p>
<!-- end element -->
</td>
<td>
<!-- second element -->
<table>
<tbody>
<tr>
<td>
<p>
MY TEXT
</p>
</td>
<td>
<p>
MY TEXT
</p>
</td>
</tr>
<tr>
<td>
<p>
MY TEXT
</p>
</td>
</tr>
</tbody>
</table>
<!-- end element -->
</td>
</tr>
</table>