如何使用BeautifulSoup获取页面上特定文本后面的一些内容?

时间:2017-02-04 19:01:43

标签: python html python-3.x web-scraping beautifulsoup

我很难尝试清理一些HTML代码以获取一些特定的href链接,以及表td标记内的文本内容,例如日期和文本。

这是网页link。您必须点击DFP才能访问此页面。

我只想要文字DFP - ENET - ATIVO之后的信息。

enter image description here

这是HTML代码:

html_source = """
<!DOCTYPE html>
<html>
<head>
    <title></title>
</head>
<body>
    <table align="center" border="0" cellpadding="0" cellspacing="0" width="640">
        <tbody>
            <tr>
                <td align="right" colspan="3"><img border="0" src="images/titulos_ciaslist_info_sobre_empr_IPEV.gif"><br>
                <br>
                <br>
                <br></td>
            </tr>
            <tr>
                <td colspan="3"><font class="TextoEx"><b>Código CVM : 001023<br>
                Razão Social : BANCO DO BRASIL S.A.<br>
                CNPJ : 00.000.000/0001-91<br>
                <br>
                <br>
                <br>
                <br></b></font></td>
            </tr>
            <tr class="LegendaPequenaC">
                <td bgcolor="#F7F7F7" style="COLOR : 'olivedrab'" width="33%">9 documento(s) encontrado(s)</td>
                <td align="center" bgcolor="#F7F7F7" style="COLOR : 'olivedrab'" width="33%">Exibindo 1 a 9</td>
                <td align="right" bgcolor="#F7F7F7" style="COLOR : 'olivedrab'" width="33%"></td>
            </tr>
            <tr valign="top">
                <td colspan="3">
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%"></table>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%">
                        <tbody>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7" width="20%"><b>Categoria</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" width="50%">DFP - ENET - Ativo</td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('57534','CONSULTA')" style="COLOR : 'olivedrab'">Consulta</a></b></td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('57534','DOWNLOAD')" style="COLOR : 'olivedrab'">Download</a></b></td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Data Encerramento</b></td>
                                <td bgcolor="#FFFFFF">31/12/2015</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Data Entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" nowrap>02/06/2016 11:44</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Tipo Apresentação</b></td>
                                <td bgcolor="#FFFFFF">Reapresentação Espontânea</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Versão</b></td>
                                <td bgcolor="#FFFFFF" colspan="3" nowrap>3.0</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Prot. de entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="4">
                                    <a href="javascript:fVisualizaProtocolo_ENET('57534','CONSULTA')"><u>001023DFP311220150300057534-67</u></a>
                                </td>
                            </tr>
                        </tbody>
                    </table><br>
                    <br>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%"></table>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%">
                        <tbody>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7" width="20%"><b>Categoria</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" width="50%">DFP - ENET - Inativo</td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('54536','CONSULTA')" style="COLOR : 'olivedrab'">Consulta</a></b></td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('54536','DOWNLOAD')" style="COLOR : 'olivedrab'">Download</a></b></td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Data Encerramento</b></td>
                                <td bgcolor="#FFFFFF">31/12/2015</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Data Entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" nowrap>28/03/2016 22:09</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Tipo Apresentação</b></td>
                                <td bgcolor="#FFFFFF">Reapresentação Espontânea</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Versão</b></td>
                                <td bgcolor="#FFFFFF" colspan="3" nowrap>2.0</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Prot. de entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="4">
                                    <a href="javascript:fVisualizaProtocolo_ENET('54536','CONSULTA')"><u>001023DFP311220150200054536-63</u></a>
                                </td>
                            </tr>
                        </tbody>
                    </table><br>
                    <br>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%"></table>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%">
                        <tbody>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7" width="20%"><b>Categoria</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" width="50%">DFP - ENET - Inativo</td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('53614','CONSULTA')" style="COLOR : 'olivedrab'">Consulta</a></b></td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('53614','DOWNLOAD')" style="COLOR : 'olivedrab'">Download</a></b></td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Data Encerramento</b></td>
                                <td bgcolor="#FFFFFF">31/12/2015</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Data Entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" nowrap>25/02/2016 08:29</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Tipo Apresentação</b></td>
                                <td bgcolor="#FFFFFF">Apresentação</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Versão</b></td>
                                <td bgcolor="#FFFFFF" colspan="3" nowrap>1.0</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Prot. de entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="4">
                                    <a href="javascript:fVisualizaProtocolo_ENET('53614','CONSULTA')"><u>001023DFP311220150100053614-77</u></a>
                                </td>
                            </tr>
                        </tbody>
                    </table><br>
                    <br>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%"></table>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%">
                        <tbody>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7" width="20%"><b>Categoria</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" width="50%">DFP - ENET - Ativo</td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('45354','CONSULTA')" style="COLOR : 'olivedrab'">Consulta</a></b></td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('45354','DOWNLOAD')" style="COLOR : 'olivedrab'">Download</a></b></td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Data Encerramento</b></td>
                                <td bgcolor="#FFFFFF">31/12/2014</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Data Entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" nowrap>27/03/2015 08:18</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Tipo Apresentação</b></td>
                                <td bgcolor="#FFFFFF">Reapresentação Espontânea</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Versão</b></td>
                                <td bgcolor="#FFFFFF" colspan="3" nowrap>2.0</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Prot. de entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="4">
                                    <a href="javascript:fVisualizaProtocolo_ENET('45354','CONSULTA')"><u>001023DFP311220140200045354-67</u></a>
                                </td>
                            </tr>
                        </tbody>
                    </table><br>
                    <br>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%"></table>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%">
                        <tbody>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7" width="20%"><b>Categoria</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" width="50%">DFP - ENET - Inativo</td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('43994','CONSULTA')" style="COLOR : 'olivedrab'">Consulta</a></b></td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('43994','DOWNLOAD')" style="COLOR : 'olivedrab'">Download</a></b></td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Data Encerramento</b></td>
                                <td bgcolor="#FFFFFF">31/12/2014</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Data Entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" nowrap>11/02/2015 08:24</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Tipo Apresentação</b></td>
                                <td bgcolor="#FFFFFF">Apresentação</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Versão</b></td>
                                <td bgcolor="#FFFFFF" colspan="3" nowrap>1.0</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Prot. de entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="4">
                                    <a href="javascript:fVisualizaProtocolo_ENET('43994','CONSULTA')"><u>001023DFP311220140100043994-74</u></a>
                                </td>
                            </tr>
                        </tbody>
                    </table><br>
                    <br>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%"></table>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%">
                        <tbody>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7" width="20%"><b>Categoria</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" width="50%">DFP - ENET - Ativo</td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('41430','CONSULTA')" style="COLOR : 'olivedrab'">Consulta</a></b></td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('41430','DOWNLOAD')" style="COLOR : 'olivedrab'">Download</a></b></td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Data Encerramento</b></td>
                                <td bgcolor="#FFFFFF">31/12/2013</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Data Entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" nowrap>25/09/2014 18:24</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Tipo Apresentação</b></td>
                                <td bgcolor="#FFFFFF">Reapresentação Espontânea</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Versão</b></td>
                                <td bgcolor="#FFFFFF" colspan="3" nowrap>4.0</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Prot. de entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="4">
                                    <a href="javascript:fVisualizaProtocolo_ENET('41430','CONSULTA')"><u>001023DFP311220130400041430-77</u></a>
                                </td>
                            </tr>
                        </tbody>
                    </table><br>
                    <br>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%"></table>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%">
                        <tbody>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7" width="20%"><b>Categoria</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" width="50%">DFP - ENET - Inativo</td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('35587','CONSULTA')" style="COLOR : 'olivedrab'">Consulta</a></b></td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('35587','DOWNLOAD')" style="COLOR : 'olivedrab'">Download</a></b></td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Data Encerramento</b></td>
                                <td bgcolor="#FFFFFF">31/12/2013</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Data Entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" nowrap>27/03/2014 09:55</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Tipo Apresentação</b></td>
                                <td bgcolor="#FFFFFF">Reapresentação Espontânea</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Versão</b></td>
                                <td bgcolor="#FFFFFF" colspan="3" nowrap>3.0</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Prot. de entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="4">
                                    <a href="javascript:fVisualizaProtocolo_ENET('35587','CONSULTA')"><u>001023DFP311220130300035587-73</u></a>
                                </td>
                            </tr>
                        </tbody>
                    </table><br>
                    <br>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%"></table>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%">
                        <tbody>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7" width="20%"><b>Categoria</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" width="50%">DFP - ENET - Inativo</td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('34667','CONSULTA')" style="COLOR : 'olivedrab'">Consulta</a></b></td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('34667','DOWNLOAD')" style="COLOR : 'olivedrab'">Download</a></b></td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Data Encerramento</b></td>
                                <td bgcolor="#FFFFFF">31/12/2013</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Data Entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" nowrap>19/02/2014 17:47</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Tipo Apresentação</b></td>
                                <td bgcolor="#FFFFFF">Reapresentação Espontânea</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Versão</b></td>
                                <td bgcolor="#FFFFFF" colspan="3" nowrap>2.0</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Prot. de entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="4">
                                    <a href="javascript:fVisualizaProtocolo_ENET('34667','CONSULTA')"><u>001023DFP311220130200034667-63</u></a>
                                </td>
                            </tr>
                        </tbody>
                    </table><br>
                    <br>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%"></table>
                    <table align="center" bgcolor="#BEBEBE" border="0" cellpadding="0" cellspacing="1" width="95%">
                        <tbody>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7" width="20%"><b>Categoria</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" width="50%">DFP - ENET - Inativo</td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('34513','CONSULTA')" style="COLOR : 'olivedrab'">Consulta</a></b></td>
                                <td align="center" bgcolor="#F7F7F7" class="LegendaPequenaC" width="15%"><b><a href="javascript:fVisualizaArquivo_ENET('34513','DOWNLOAD')" style="COLOR : 'olivedrab'">Download</a></b></td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Data Encerramento</b></td>
                                <td bgcolor="#FFFFFF">31/12/2013</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Data Entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="2" nowrap>13/02/2014 08:54</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Tipo Apresentação</b></td>
                                <td bgcolor="#FFFFFF">Apresentação</td>
                                <td bgcolor="#F7F7F7" width="15%"><b>Versão</b></td>
                                <td bgcolor="#FFFFFF" colspan="3" nowrap>1.0</td>
                            </tr>
                            <tr class="TableOptions">
                                <td bgcolor="#F7F7F7"><b>Prot. de entrega</b></td>
                                <td bgcolor="#FFFFFF" colspan="4">
                                    <a href="javascript:fVisualizaProtocolo_ENET('34513','CONSULTA')"><u>001023DFP311220130100034513-71</u></a>
                                </td>
                            </tr>
                        </tbody>
                    </table><br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                    <br>
                </td>
            </tr>
            <tr>
                <td></td>
            </tr>
            <tr class="LegendaPequenaC">
                <td bgcolor="#F7F7F7" style="COLOR : 'olivedrab'" width="33%">9 documento(s) encontrado(s)</td>
                <td align="center" bgcolor="#F7F7F7" style="COLOR : 'olivedrab'" width="33%">Exibindo 1 a 9</td>
                <td align="right" bgcolor="#F7F7F7" style="COLOR : 'olivedrab'" width="33%"></td>
            </tr>
            <tr>
                <td></td>
            </tr>
        </tbody>
    </table>
</body>
</html>
"""

这是我的代码:

from bs4 import BeautifulSoup

#insert html_source here
soup = BeautifulSoup(html_source, 'html.parser')
table = soup.find('table')

tds = table.find_all('td', {'colspan':'2'})
for td in tds:
    if td.text == 'DFP - ENET - Ativo':
        print(td.find_next('href'))

当我尝试使用print(td.next_sibling())时,我收到了以下TypeError消息:

TypeError: 'NavigableString' object is not callable

我已阅读this questionthis one,但无法使我的代码正常工作。

如果可能,我希望以下格式输出此特定HTML页面(包含3个活动项目):

[("javascript:fVisualizaArquivo_ENET('57534','CONSULTA')", "31/12/2015", "02/06/2016 11:44", "Reapresentação Espontânea", "3.0"), ("javascript:fVisualizaArquivo_ENET('45354','CONSULTA')", "31/12/2014", "27/03/2015 08:18", "Reapresentação Espontânea", "2.0"), ("javascript:fVisualizaArquivo_ENET('41430','CONSULTA')", "31/12/2013", "25/09/2014 18:24", "Reapresentação Espontânea", "4.0")]

1 个答案:

答案 0 :(得分:1)

from bs4 import BeautifulSoup

#insert html_source here
soup = BeautifulSoup(html_source, 'html.parser')
links = [a['href']for a in soup('a', text='Download')]
Encerramento = [i.find_next('td').text for i in soup('b', text='Data Encerramento')]
Entrega = [i.find_next('td').text for i in soup('b', text='Data Entrega')]
Tipo = [i.find_next('td').text for i in soup('b', text='Tipo Apresentação')]
Versão = [i.find_next('td').text for i in soup('b', text='Versão')]
for i in zip(links, Encerramento, Entrega, Tipo, Versão):
    print(i)

出:

("javascript:fVisualizaArquivo_ENET('57534','DOWNLOAD')", '31/12/2015', '02/06/2016 11:44', 'Reapresentação Espontânea', '3.0')
("javascript:fVisualizaArquivo_ENET('54536','DOWNLOAD')", '31/12/2015', '28/03/2016 22:09', 'Reapresentação Espontânea', '2.0')
("javascript:fVisualizaArquivo_ENET('53614','DOWNLOAD')", '31/12/2015', '25/02/2016 08:29', 'Apresentação', '1.0')
("javascript:fVisualizaArquivo_ENET('45354','DOWNLOAD')", '31/12/2014', '27/03/2015 08:18', 'Reapresentação Espontânea', '2.0')
("javascript:fVisualizaArquivo_ENET('43994','DOWNLOAD')", '31/12/2014', '11/02/2015 08:24', 'Apresentação', '1.0')
("javascript:fVisualizaArquivo_ENET('41430','DOWNLOAD')", '31/12/2013', '25/09/2014 18:24', 'Reapresentação Espontânea', '4.0')
("javascript:fVisualizaArquivo_ENET('35587','DOWNLOAD')", '31/12/2013', '27/03/2014 09:55', 'Reapresentação Espontânea', '3.0')
("javascript:fVisualizaArquivo_ENET('34667','DOWNLOAD')", '31/12/2013', '19/02/2014 17:47', 'Reapresentação Espontânea', '2.0')
("javascript:fVisualizaArquivo_ENET('34513','DOWNLOAD')", '31/12/2013', '13/02/2014 08:54', 'Apresentação', '1.0')

使用文本作为锚点,然后找到下一个td标记。 有五个列表,使用zip将它们放在一起。