在表

时间:2017-08-29 18:35:47

标签: python html beautifulsoup

我在电子邮件中嵌入了HTML数据。到目前为止,数据一直存在于表格中,但是这次它不在表格中,因此我很难全部捕捉到它。

以下是数据在电子邮件中的显示方式:

enter image description here

这是HTML代码:

</o:shapelayout></xml><![endif]--></head>
<body lang=EN-US link=blue vlink=purple>
<div class=WordSection1>
    <p class=MsoNormal><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'><o:p>&nbsp;</o:p></span>
    </p>
    <p class=MsoNormal><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'><o:p>&nbsp;</o:p></span>
    </p>
    <div>
        <p class=MsoNormal><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'>&nbsp;<o:p></o:p></span>
        </p>
        <p class=MsoNormal><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'>-----<o:p></o:p></span>
        </p>
        <p class=MsoNormal><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Eric Brazer Jr. </span><i><span
                style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p></o:p></span></i>
        </p>
        <p class=MsoNormal><i><span
                style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Manager, GB Cod Fixed Gear Sector</span></i><span
                style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
        <p class=MsoNormal><b><span
                style='font-size:11.0pt;font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen's Alliance</span></b><span
                style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
        <p class=MsoNormal><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>1566 Main Street, Chatham, MA 02633</span><span
                style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
        <p class=MsoNormal><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>(508) 945-2432 x105&nbsp; --&nbsp; Fax: (508) 945-0981</span><span
                style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
        <p class=MsoNormal><span
                style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#002776'><a
                href="mailto:melissa@capecodfishermen.org"><span style='color:#002776'>eric@capecodfishermen.org</span></a></span><span
                style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
        <p class=MsoNormal><span
                style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#002776'><a
                href="www.capecodfishermen.org"><span
                style='color:#002776'>www.capecodfishermen.org</span></a></span><span
                style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
        <p class=MsoNormal><b><span
                style='font-size:11.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>Small Boats.&nbsp; Big Ideas.</span></b><span
                style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
    </div>
    <p class=MsoNormal><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'><o:p>&nbsp;</o:p></span>
    </p>
    <div>
        <div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'>
            <p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
                    style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Stephanie Rafael [mailto:nbsector9@gmail.com] <br><b>Sent:</b> Thursday, May 23, 2013 2:06 PM<br><b>To:</b> Linda McCann<br><b>Cc:</b> Aaron Dority; Eric Brazer; John Haran; Rob @ NEFS III; XI and XII NEFS INC; Hank Soule; Ben Martens; Jim Reardon; Vito Giacalone; NEFS V; calberto@luzofuel.com; DaveLeveille, NEFS II<br><b>Subject:</b> NEFS IX packages available<o:p></o:p></span>
            </p>
        </div>
    </div>
    <p class=MsoNormal>
        <o:p>&nbsp;</o:p>
    </p>
    <div>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Hello All, <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Below are two package deals that
            are available to lease from NEFS IX. Please let me know if there is any interest. <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Thanks, <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Stephanie <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b>Package #1</b> <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBE Cod&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;701 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBW Cod&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;13,070 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBE Hadd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;14,100 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBW Hadd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;84,296 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GB YT&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;671 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>SNE YT&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;153 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GOM YT&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2,371 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Plaice&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2,820 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Witch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1,057 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GB Winter&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;13,316 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Redfish&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;122 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Hake&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;184 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Pollock&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;7,427 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>SNE Winter&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;28,935 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b>Asking Price $37,556.65</b> <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>
            <o:p>&nbsp;</o:p>
        </p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b>Package #2
            <o:p></o:p>
        </b></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBE Cod&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;432 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBW Cod&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;8,059 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBE Hadd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;14,629 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBW Hadd&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;87,454 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GB YT&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1,817 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>SNE YT&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;76 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GOM YT&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;200 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Plaice&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;2,043 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Witch&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1,413 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GB Winter&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;23,784 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Redfish&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;122 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Hake&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;934 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Pollock&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;7,899 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>SNE Winter&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;5,334 <o:p></o:p></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b>Asking Price $28,032.91
            <o:p></o:p>
        </b></p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>
            <o:p>&nbsp;</o:p>
        </p>
        <p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>
            <o:p>&nbsp;</o:p>
        </p>
        <div>
            <p class=MsoNormal>
                <o:p>&nbsp;</o:p>
            </p>
        </div>
        <p class=MsoNormal>-- <br>Stephanie Rafael-DeMello<br>IX Northeast Fishery Sector, Inc.<br>350 South Front
            Street<br>New Bedford, MA 02740<br>508.990.2800<br>Fax:508.990.2899 <o:p></o:p></p>
    </div>
    <div class=MsoNormal align=center style='text-align:center'>
</body></html>

所以尝试BeautifulSoup(html).find_all("table")失败了......

但是尝试:

p_list = []
for i in BeautifulSoup(html).find_all('p'):
    p_list.append(i.next_sibling)
print("p_list:", p_list)

产生

p_list: [
<p class="MsoNormal"><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'><o:p> </o:p></span></p>,
<div>
    <p class="MsoNormal"><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'> <o:p></o:p></span>
    </p>
    <p class="MsoNormal"><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'>-----<o:p></o:p></span>
    </p>
    <p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Eric Brazer Jr. </span><i><span
            style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p></o:p></span></i>
    </p>
    <p class="MsoNormal"><i><span
            style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Manager, GB Cod Fixed Gear Sector</span></i><span
            style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
    <p class="MsoNormal"><b><span
            style='font-size:11.0pt;font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen's Alliance</span></b><span
            style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
    <p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>1566 Main Street, Chatham, MA 02633</span><span
            style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
    <p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>(508) 945-2432 x105  --  Fax: (508) 945-0981</span><span
            style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
    <p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#002776'><a
            href="mailto:melissa@capecodfishermen.org"><span style="color:#002776">eric@capecodfishermen.org</span></a></span><span
            style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
    <p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#002776'><a
            href="www.capecodfishermen.org"><span style="color:#002776">www.capecodfishermen.org</span></a></span><span
            style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
    <p class="MsoNormal"><b><span
            style='font-size:11.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>Small Boats.  Big Ideas.</span></b><span
            style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
</div>,
<p class="MsoNormal"><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'>-----<o:p></o:p></span>
</p>,
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Eric Brazer Jr. </span><i><span
        style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p></o:p></span></i>
</p>,
<p class="MsoNormal"><i><span style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Manager, GB Cod Fixed Gear Sector</span></i><span
        style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>,
<p class="MsoNormal"><b><span style='font-size:11.0pt;font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen's Alliance</span></b><span
        style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>,
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>1566 Main Street, Chatham, MA 02633</span><span
        style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>,
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>(508) 945-2432 x105  --  Fax: (508) 945-0981</span><span
        style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>,
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#002776'><a
        href="mailto:melissa@capecodfishermen.org"><span
        style="color:#002776">eric@capecodfishermen.org</span></a></span><span
        style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>,
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#002776'><a
        href="www.capecodfishermen.org"><span style="color:#002776">www.capecodfishermen.org</span></a></span><span
        style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>,
<p class="MsoNormal"><b><span style='font-size:11.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>Small Boats.  Big Ideas.</span></b><span
        style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>, None,
<div>
    <div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
        <p class="MsoNormal"><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
                style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Stephanie Rafael [mailto:nbsector9@gmail.com] <br/><b>Sent:</b> Thursday, May 23, 2013 2:06 PM<br/><b>To:</b> Linda McCann<br/><b>Cc:</b> Aaron Dority; Eric Brazer; John Haran; Rob @ NEFS III; XI and XII NEFS INC; Hank Soule; Ben Martens; Jim Reardon; Vito Giacalone; NEFS V; calberto@luzofuel.com; DaveLeveille, NEFS II<br/><b>Subject:</b> NEFS IX packages available<o:p></o:p></span>
        </p>
    </div>
</div>, None,
<div>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hello All,<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Below are two package deals that are
        available to lease from NEFS IX. Please let me know if there is any interest.<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Thanks,<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Stephanie<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Package #1</b><o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Cod               701<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Cod             13,070<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Hadd            14,100<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Hadd          84,296<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB YT                    671<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE YT                  153<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GOM YT               2,371<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Plaice                    2,820<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Witch                    1,057<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB Winter           13,316<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Redfish                 122<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hake                      184<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Pollock                  7,427<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE Winter         28,935<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Asking Price $37,556.65</b><o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">
        <o:p> </o:p>
    </p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Package #2
        <o:p></o:p>
    </b></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Cod               432<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Cod             8,059<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Hadd            14,629<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Hadd          87,454<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB YT                    1,817<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE YT                  76<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GOM YT               200<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Plaice                    2,043<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Witch                    1,413<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB Winter           23,784<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Redfish                 122<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hake                      934<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Pollock                  7,899<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE Winter         5,334<o:p></o:p></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Asking Price $28,032.91
        <o:p></o:p>
    </b></p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">
        <o:p> </o:p>
    </p>
    <p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">
        <o:p> </o:p>
    </p>
    <div>
        <p class="MsoNormal">
            <o:p> </o:p>
        </p>
    </div>
    <p class="MsoNormal">-- <br/>Stephanie Rafael-DeMello<br/>IX Northeast Fishery Sector, Inc.<br/>350 South Front
        Street<br/>New Bedford, MA 02740<br/>508.990.2800<br/>Fax:508.990.2899<o:p></o:p></p>
</div>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Below are two package deals that are
    available to lease from NEFS IX. Please let me know if there is any interest.<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Thanks,<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Stephanie<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Package #1</b><o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Cod               701<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Cod             13,070<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Hadd            14,100<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Hadd          84,296<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB YT                    671<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE YT                  153<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GOM YT               2,371<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Plaice                    2,820<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Witch                    1,057<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB Winter           13,316<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Redfish                 122<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hake                      184<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Pollock                  7,427<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE Winter         28,935<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Asking Price $37,556.65</b><o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">
    <o:p> </o:p>
</p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Package #2
    <o:p></o:p>
</b></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Cod               432<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Cod             8,059<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Hadd            14,629<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Hadd          87,454<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB YT                    1,817<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE YT                  76<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GOM YT               200<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Plaice                    2,043<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Witch                    1,413<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB Winter           23,784<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Redfish                 122<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hake                      934<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Pollock                  7,899<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE Winter         5,334<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Asking Price $28,032.91
    <o:p></o:p>
</b></p>,None, None]

因此,我们可以看到有价值的鱼类数据位于<p>标签之间。但有没有办法从<p>标签之间发现的所有内容中提取数据?我想创建一个列表并附加在<p>标签之间找到的值会起作用,但它不会。

任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:0)

这是电子邮件HTML,因此您没有格式正确的表格,因此您可以直接从<p>标记中提取数据。

soup = BeautifulSoup(html,"html.parser")
lines = [p.get_text() for p in soup.find_all("p")]

get_text()会从<p>标记中删除所有HTML标记,并为您提供如下明文:

[u'\xa0', u'\xa0', u'\xa0', u'-----', u'Eric Brazer Jr. ', u'Manager, GB Cod Fixed Gear Sector', u"Cape Cod Commercial Fishermen's Alliance", u'1566 Main Street, Chatham, MA 02633', u'(508) 945-2432 x105\xa0 --\xa0 Fax: (508) 945-0981', u'eric@capecodfishermen.org', u'www.capecodfishermen.org', u'Small Boats.\xa0 Big Ideas.', u'\xa0', u'From: Stephanie Rafael [mailto:nbsector9@gmail.com] Sent: Thursday, May 23, 2013 2:06 PMTo: Linda McCannCc: Aaron Dority; Eric Brazer; John Haran; Rob @ NEFS III; XI and XII NEFS INC; Hank Soule; Ben Martens; Jim Reardon; Vito Giacalone; NEFS V; calberto@luzofuel.com; DaveLeveille, NEFS IISubject: NEFS IX packages available', u'\xa0', u'Hello All,', u'Below are two package deals that are available to lease from NEFS IX. Please let me know if there is any interest.', u'Thanks,', u'Stephanie', u'Package #1', u'GBE Cod\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 701', u'GBW Cod\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 13,070', u'GBE Hadd\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 14,100', u'GBW Hadd\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 84,296', u'GB YT\xa0\xa0\xa0 \xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 671', u'SNE YT\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 153', u'GOM YT\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 2,371', u'Plaice\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 2,820', u'Witch\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 1,057', u'GB Winter\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 13,316', u'Redfish\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 122', u'Hake\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 184', u'Pollock\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 7,427', u'SNE Winter\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 28,935', u'Asking Price $37,556.65', u'\xa0', u'Package #2', u'GBE Cod\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 432', u'GBW Cod\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 8,059', u'GBE Hadd\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 14,629', u'GBW Hadd\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 87,454', u'GB YT\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 1,817', u'SNE YT\xa0 \xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 76', u'GOM YT\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 200', u'Plaice\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 2,043', u'Witch\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 1,413', u'GB Winter\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 23,784', u'Redfish\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 122', u'Hake\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 934', u'Pollock\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 7,899', u'SNE Winter\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 5,334', u'Asking Price $28,032.91', u'\xa0', u'\xa0', u'\xa0', u'-- Stephanie Rafael-DeMelloIX Northeast Fishery Sector, Inc.350 South Front StreetNew Bedford, MA 02740508.990.2800Fax:508.990.2899 ']

现在,您可以遍历每个字符串,然后使用str.startswith()使用索引“Package#1”和“Asking Price”字符串查找您的表,然后使用{{1}将字符串拆分为列}。

Full code我使用拆分数据到行列表中,每行都是一个元组。