我在电子邮件中嵌入了HTML数据。到目前为止,数据一直存在于表格中,但是这次它不在表格中,因此我很难全部捕捉到它。
以下是数据在电子邮件中的显示方式:
这是HTML代码:
</o:shapelayout></xml><![endif]--></head>
<body lang=EN-US link=blue vlink=purple>
<div class=WordSection1>
<p class=MsoNormal><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'><o:p> </o:p></span>
</p>
<p class=MsoNormal><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'><o:p> </o:p></span>
</p>
<div>
<p class=MsoNormal><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'> <o:p></o:p></span>
</p>
<p class=MsoNormal><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'>-----<o:p></o:p></span>
</p>
<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Eric Brazer Jr. </span><i><span
style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p></o:p></span></i>
</p>
<p class=MsoNormal><i><span
style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Manager, GB Cod Fixed Gear Sector</span></i><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
<p class=MsoNormal><b><span
style='font-size:11.0pt;font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen's Alliance</span></b><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>1566 Main Street, Chatham, MA 02633</span><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
<p class=MsoNormal><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>(508) 945-2432 x105 -- Fax: (508) 945-0981</span><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
<p class=MsoNormal><span
style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#002776'><a
href="mailto:melissa@capecodfishermen.org"><span style='color:#002776'>eric@capecodfishermen.org</span></a></span><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
<p class=MsoNormal><span
style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#002776'><a
href="www.capecodfishermen.org"><span
style='color:#002776'>www.capecodfishermen.org</span></a></span><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
<p class=MsoNormal><b><span
style='font-size:11.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>Small Boats. Big Ideas.</span></b><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
</div>
<p class=MsoNormal><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'><o:p> </o:p></span>
</p>
<div>
<div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'>
<p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Stephanie Rafael [mailto:nbsector9@gmail.com] <br><b>Sent:</b> Thursday, May 23, 2013 2:06 PM<br><b>To:</b> Linda McCann<br><b>Cc:</b> Aaron Dority; Eric Brazer; John Haran; Rob @ NEFS III; XI and XII NEFS INC; Hank Soule; Ben Martens; Jim Reardon; Vito Giacalone; NEFS V; calberto@luzofuel.com; DaveLeveille, NEFS II<br><b>Subject:</b> NEFS IX packages available<o:p></o:p></span>
</p>
</div>
</div>
<p class=MsoNormal>
<o:p> </o:p>
</p>
<div>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Hello All, <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Below are two package deals that
are available to lease from NEFS IX. Please let me know if there is any interest. <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Thanks, <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Stephanie <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b>Package #1</b> <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBE Cod 701 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBW Cod 13,070 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBE Hadd 14,100 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBW Hadd 84,296 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GB YT 671 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>SNE YT 153 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GOM YT 2,371 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Plaice 2,820 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Witch 1,057 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GB Winter 13,316 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Redfish 122 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Hake 184 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Pollock 7,427 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>SNE Winter 28,935 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b>Asking Price $37,556.65</b> <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>
<o:p> </o:p>
</p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b>Package #2
<o:p></o:p>
</b></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBE Cod 432 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBW Cod 8,059 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBE Hadd 14,629 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GBW Hadd 87,454 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GB YT 1,817 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>SNE YT 76 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GOM YT 200 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Plaice 2,043 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Witch 1,413 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>GB Winter 23,784 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Redfish 122 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Hake 934 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>Pollock 7,899 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>SNE Winter 5,334 <o:p></o:p></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'><b>Asking Price $28,032.91
<o:p></o:p>
</b></p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>
<o:p> </o:p>
</p>
<p class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto'>
<o:p> </o:p>
</p>
<div>
<p class=MsoNormal>
<o:p> </o:p>
</p>
</div>
<p class=MsoNormal>-- <br>Stephanie Rafael-DeMello<br>IX Northeast Fishery Sector, Inc.<br>350 South Front
Street<br>New Bedford, MA 02740<br>508.990.2800<br>Fax:508.990.2899 <o:p></o:p></p>
</div>
<div class=MsoNormal align=center style='text-align:center'>
</body></html>
所以尝试BeautifulSoup(html).find_all("table")
失败了......
但是尝试:
p_list = []
for i in BeautifulSoup(html).find_all('p'):
p_list.append(i.next_sibling)
print("p_list:", p_list)
产生
p_list: [
<p class="MsoNormal"><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'><o:p> </o:p></span></p>,
<div>
<p class="MsoNormal"><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'> <o:p></o:p></span>
</p>
<p class="MsoNormal"><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'>-----<o:p></o:p></span>
</p>
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Eric Brazer Jr. </span><i><span
style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p></o:p></span></i>
</p>
<p class="MsoNormal"><i><span
style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Manager, GB Cod Fixed Gear Sector</span></i><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
<p class="MsoNormal"><b><span
style='font-size:11.0pt;font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen's Alliance</span></b><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>1566 Main Street, Chatham, MA 02633</span><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>(508) 945-2432 x105 -- Fax: (508) 945-0981</span><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#002776'><a
href="mailto:melissa@capecodfishermen.org"><span style="color:#002776">eric@capecodfishermen.org</span></a></span><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#002776'><a
href="www.capecodfishermen.org"><span style="color:#002776">www.capecodfishermen.org</span></a></span><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
<p class="MsoNormal"><b><span
style='font-size:11.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>Small Boats. Big Ideas.</span></b><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>
</div>,
<p class="MsoNormal"><span style='font-size:10.0pt;font-family:"Georgia","serif";color:#1F497D'>-----<o:p></o:p></span>
</p>,
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Eric Brazer Jr. </span><i><span
style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p></o:p></span></i>
</p>,
<p class="MsoNormal"><i><span style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Manager, GB Cod Fixed Gear Sector</span></i><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>,
<p class="MsoNormal"><b><span style='font-size:11.0pt;font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen's Alliance</span></b><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>,
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>1566 Main Street, Chatham, MA 02633</span><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>,
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>(508) 945-2432 x105 -- Fax: (508) 945-0981</span><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>,
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#002776'><a
href="mailto:melissa@capecodfishermen.org"><span
style="color:#002776">eric@capecodfishermen.org</span></a></span><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>,
<p class="MsoNormal"><span style='font-size:9.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#002776'><a
href="www.capecodfishermen.org"><span style="color:#002776">www.capecodfishermen.org</span></a></span><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>,
<p class="MsoNormal"><b><span style='font-size:11.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>Small Boats. Big Ideas.</span></b><span
style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p></o:p></span></p>, None,
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span
style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> Stephanie Rafael [mailto:nbsector9@gmail.com] <br/><b>Sent:</b> Thursday, May 23, 2013 2:06 PM<br/><b>To:</b> Linda McCann<br/><b>Cc:</b> Aaron Dority; Eric Brazer; John Haran; Rob @ NEFS III; XI and XII NEFS INC; Hank Soule; Ben Martens; Jim Reardon; Vito Giacalone; NEFS V; calberto@luzofuel.com; DaveLeveille, NEFS II<br/><b>Subject:</b> NEFS IX packages available<o:p></o:p></span>
</p>
</div>
</div>, None,
<div>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hello All,<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Below are two package deals that are
available to lease from NEFS IX. Please let me know if there is any interest.<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Thanks,<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Stephanie<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Package #1</b><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Cod 701<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Cod 13,070<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Hadd 14,100<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Hadd 84,296<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB YT 671<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE YT 153<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GOM YT 2,371<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Plaice 2,820<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Witch 1,057<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB Winter 13,316<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Redfish 122<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hake 184<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Pollock 7,427<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE Winter 28,935<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Asking Price $37,556.65</b><o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">
<o:p> </o:p>
</p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Package #2
<o:p></o:p>
</b></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Cod 432<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Cod 8,059<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Hadd 14,629<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Hadd 87,454<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB YT 1,817<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE YT 76<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GOM YT 200<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Plaice 2,043<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Witch 1,413<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB Winter 23,784<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Redfish 122<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hake 934<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Pollock 7,899<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE Winter 5,334<o:p></o:p></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Asking Price $28,032.91
<o:p></o:p>
</b></p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">
<o:p> </o:p>
</p>
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">
<o:p> </o:p>
</p>
<div>
<p class="MsoNormal">
<o:p> </o:p>
</p>
</div>
<p class="MsoNormal">-- <br/>Stephanie Rafael-DeMello<br/>IX Northeast Fishery Sector, Inc.<br/>350 South Front
Street<br/>New Bedford, MA 02740<br/>508.990.2800<br/>Fax:508.990.2899<o:p></o:p></p>
</div>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Below are two package deals that are
available to lease from NEFS IX. Please let me know if there is any interest.<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Thanks,<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Stephanie<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Package #1</b><o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Cod 701<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Cod 13,070<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Hadd 14,100<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Hadd 84,296<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB YT 671<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE YT 153<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GOM YT 2,371<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Plaice 2,820<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Witch 1,057<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB Winter 13,316<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Redfish 122<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hake 184<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Pollock 7,427<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE Winter 28,935<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Asking Price $37,556.65</b><o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">
<o:p> </o:p>
</p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Package #2
<o:p></o:p>
</b></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Cod 432<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Cod 8,059<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBE Hadd 14,629<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GBW Hadd 87,454<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB YT 1,817<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE YT 76<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GOM YT 200<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Plaice 2,043<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Witch 1,413<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">GB Winter 23,784<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Redfish 122<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Hake 934<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">Pollock 7,899<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto">SNE Winter 5,334<o:p></o:p></p>,
<p class="MsoNormal" style="mso-margin-top-alt:auto;mso-margin-bottom-alt:auto"><b>Asking Price $28,032.91
<o:p></o:p>
</b></p>,None, None]
因此,我们可以看到有价值的鱼类数据位于<p>
标签之间。但有没有办法从<p>
标签之间发现的所有内容中提取数据?我想创建一个列表并附加在<p>
标签之间找到的值会起作用,但它不会。
任何帮助将不胜感激。
答案 0 :(得分:0)
这是电子邮件HTML,因此您没有格式正确的表格,因此您可以直接从<p>
标记中提取数据。
soup = BeautifulSoup(html,"html.parser")
lines = [p.get_text() for p in soup.find_all("p")]
get_text()
会从<p>
标记中删除所有HTML标记,并为您提供如下明文:
[u'\xa0', u'\xa0', u'\xa0', u'-----', u'Eric Brazer Jr. ', u'Manager, GB Cod Fixed Gear Sector', u"Cape Cod Commercial Fishermen's Alliance", u'1566 Main Street, Chatham, MA 02633', u'(508) 945-2432 x105\xa0 --\xa0 Fax: (508) 945-0981', u'eric@capecodfishermen.org', u'www.capecodfishermen.org', u'Small Boats.\xa0 Big Ideas.', u'\xa0', u'From: Stephanie Rafael [mailto:nbsector9@gmail.com] Sent: Thursday, May 23, 2013 2:06 PMTo: Linda McCannCc: Aaron Dority; Eric Brazer; John Haran; Rob @ NEFS III; XI and XII NEFS INC; Hank Soule; Ben Martens; Jim Reardon; Vito Giacalone; NEFS V; calberto@luzofuel.com; DaveLeveille, NEFS IISubject: NEFS IX packages available', u'\xa0', u'Hello All,', u'Below are two package deals that are available to lease from NEFS IX. Please let me know if there is any interest.', u'Thanks,', u'Stephanie', u'Package #1', u'GBE Cod\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 701', u'GBW Cod\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 13,070', u'GBE Hadd\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 14,100', u'GBW Hadd\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 84,296', u'GB YT\xa0\xa0\xa0 \xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 671', u'SNE YT\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 153', u'GOM YT\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 2,371', u'Plaice\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 2,820', u'Witch\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 1,057', u'GB Winter\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 13,316', u'Redfish\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 122', u'Hake\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 184', u'Pollock\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 7,427', u'SNE Winter\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 28,935', u'Asking Price $37,556.65', u'\xa0', u'Package #2', u'GBE Cod\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 432', u'GBW Cod\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 8,059', u'GBE Hadd\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 14,629', u'GBW Hadd\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 87,454', u'GB YT\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 1,817', u'SNE YT\xa0 \xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 76', u'GOM YT\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 200', u'Plaice\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 2,043', u'Witch\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 1,413', u'GB Winter\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 23,784', u'Redfish\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 122', u'Hake\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 934', u'Pollock\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 7,899', u'SNE Winter\xa0\xa0\xa0\xa0\xa0\xa0\xa0\xa0 5,334', u'Asking Price $28,032.91', u'\xa0', u'\xa0', u'\xa0', u'-- Stephanie Rafael-DeMelloIX Northeast Fishery Sector, Inc.350 South Front StreetNew Bedford, MA 02740508.990.2800Fax:508.990.2899 ']
现在,您可以遍历每个字符串,然后使用str.startswith()
使用索引“Package#1”和“Asking Price”字符串查找您的表,然后使用{{1}将字符串拆分为列}。
Full code我使用拆分数据到行列表中,每行都是一个元组。