Question

下午好。朋友，我可以使用beautifulsoup解析html中的文本近似结构：

<div>
  <p>
     Some text 1
  </p>
  <p>
     Some text 2
  </p>
  <tbody>
    <tr>
      <td>
        <p>
          Some table text 1
        </p>
      </td>
      <td>
        <p>
          Some table text 2
        </p>
      </td>
    </tr>
  </tbody>
  <p>
     Some text 3
  </p>
</div>

我了解如何解析表，但是如何解析表以保留结构

有必要得到这样的东西：

Some text 1
Some text 2
-------------------
|Some table text 1|
|Some table text 2|
-------------------
Some text 3

我试图这样：

for tag in tags.find_all(['p', 'tbody']):

但是当然首先有所有标签

，然后是来自tbody的相同

标签，并且信息重复了

我需要在列表中获取标签：

[<p>Some text 1</p>, <p>Some text 2</p>, <tbody><tr><td><p>Some table text 
 1</p></td><p>Some table text 2</p><td></td></tr></tbody>, <p>Some text 3</p>]

BeautifulSoup遍历父（第一级）标签

0 个答案: