Question

我正在以这种格式从HTML表中搜集：

<table>

    <tr>
        <th>Name</th>
        <th>Date</th>
        <th>Number</th>
        <th>Address</th>

    </tr>

    <tr> 1

        <td> Name-1 </td>
        <td> Date-1 </td>
        <td> Number-1 </td>
        <td> Address-1 </td>

    </tr>

    <tr> 2

        <td> Name-2 </td>
        <td> Date-2 </td>
        <td> Number-2 </td>
        <td> Address-2 </td>

    </tr>

</table>

这是该页面上唯一的表格。我想用每个TD标签存储相应的TH标签信息来制作列表，然后最终将其保存为CSV。实际信息不会以-number保存，只是为了说明。这些数据有数百个表行，所有表格都以这种方式在表格中格式化。

基本上，我想制作这个名字＆＃39;是每个TR行中的第一个TD单元，日期是第二个，依此类推。

我似乎无法通过Python3和BeautifulSoup4找到一种方法，我知道这是一种方式，我只是太新了。

谢谢大家的帮助，我正在学习很多东西。

Answer 1

假设数据是统一的，以下基本示例应该有效：

table_rows = soup.find_all("tr") #list of all <tr> tags
for row in table_rows:
    cells = row.find_all("td") #list of all <td> tags within a row
    if not cells: #skip rows without td elements
        continue
    name, date, number, address = cells #unpack list of <td> tags into separate variables

Python 3 BeautifulSoup4从每个标签中选择特定标签

1 个答案: