页脚到页脚的BeautifulSoup复制表标题

时间:2018-06-12 19:04:07

标签: python beautifulsoup

我的HTML表格只有<thead>但没有<tfoot>。 需要使用BeautifulSoup将标题复制到页脚。

表格如下:

<table id="example" class="display" style="width:100%">
    <thead>
        <tr>
            <th>Name</th>
            <th>Position</th>
            <th>Office</th>
            <th>Age</th>
            <th>Start date</th>
            <th>Salary</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Ashton Cox</td>
            <td>Junior Technical Author</td>
            <td>San Francisco</td>
            <td>66</td>
            <td>2009/01/12</td>
            <td>$86,000</td>
        </tr>          
    </tbody>
</table>

但是,我需要它看起来像这样:

<table id="example" class="display" style="width:100%">
    <thead>
        <tr>
            <th>Name</th>
            <th>Position</th>
            <th>Office</th>
            <th>Age</th>
            <th>Start date</th>
            <th>Salary</th>
        </tr>
    </thead>
    <tbody>
        <tr>
            <td>Ashton Cox</td>
            <td>Junior Technical Author</td>
            <td>San Francisco</td>
            <td>66</td>
            <td>2009/01/12</td>
            <td>$86,000</td>
        </tr>          
    </tbody>
    <tfoot>
        <tr>
            <th>Name</th>
            <th>Position</th>
            <th>Office</th>
            <th>Age</th>
            <th>Start date</th>
            <th>Salary</th>
        </tr>
    </tfoot>
</table>

我认为我需要使用insert_after,但我很难看到如何复制thead的内容,创建新的tfoot,并插入<tr><th>值。 我首先尝试遍历对象并创建标记和insert_after

table_headers = soup.find_all('th')

任何想法?

1 个答案:

答案 0 :(得分:2)

这样做你想要的吗?我很惊讶插入soup.thead.tr对象将其从元素中删除。请注意copy()

    from copy import copy

    orig = """<table id="example" class="display" style="width:100%">
        <thead>
            <tr>
                <th>Name</th>
                <th>Position</th>
                <th>Office</th>
                <th>Age</th>
                <th>Start date</th>
                <th>Salary</th>
            </tr>
        </thead>
        <tbody>
            <tr>
                <td>Ashton Cox</td>
                <td>Junior Technical Author</td>
                <td>San Francisco</td>
                <td>66</td>
                <td>2009/01/12</td>
                <td>$86,000</td>
            </tr>
        </tbody>
    </table>
    """

    soup = BeautifulSoup(orig)
    tfoot = soup.new_tag('tfoot')
    # XXX: if you don't copy() the object the <tr> element is removed from <thead>
    tfoot.append(copy(soup.thead.tr))
    soup.tbody.insert_after(tfoot)