我的目标是使用RegEx扫描电子邮件中的单词" trade"然后打印找到它的整行。
我成功使用RegEx捕获此HTML文档中的其他数据(例如物种,重量,价格等),以及成功识别单词" trade",但是我没有打印出它所在的整条生产线。我确实尝试使用BeautifulSoup来实现这个目标,但是这样做有很多困难。
理想情况下,我想捕捉和打印“" trade"找到了。以下是我用来尝试识别" trade"的代码。并打印它上面的行:
with open(file_path, 'r') as f:
email = f.read()
pattern = re.search(r'\btrade\b',email).group(0)
match = re.search(r'\btrade\b', email)
if match:
for line in email:
print("TRADE STUFF:",line)
请注意,我尝试过各种方法,例如print("TRADE STUFF:", line.splitlines())
以及print("TRADE STUF:", line.stripped_strings)
,但都没有成功。
感谢您的帮助。
HTML code:
<html>
<head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>FW: NEFS 5 Available Fish</title>
<link rel="important stylesheet" href="">
<style>div.headerdisplayname {font-weight:bold;}</style></head>
<body>
<table border=0 cellspacing=0 cellpadding=0 width="100%" class="header-part1"><tr><td><b>Subject: </b>FW: NEFS 5 Available Fish</td></tr><tr><td><b>From: </b>Claire Fitz-Gerald <claire@capecodfishermen.org></td></tr><tr><td><b>Date: </b>9/5/2014 9:52 AM</td></tr></table><br>
<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head><META HTTP-EQUIV="Content-Type" CONTENT="text/html; "><meta name=Generator content="Microsoft Word 12 (filtered medium)"><!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:Wingdings;
panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
{font-family:"Franklin Gothic Book";
panose-1:2 11 5 3 2 1 2 2 2 4;}
@font-face
{font-family:"Franklin Gothic Demi";
panose-1:2 11 7 3 2 1 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1512259006;
mso-list-template-ids:-893643712;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></head><body lang=EN-US link=blue vlink=purple><div class=WordSection1><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Apologies for the delay in distributing this listing. It got lost in my inbox.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Please see the below quota listings.<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'>Thanks,<o:p></o:p></span></p><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><div><p class=MsoNormal><span style='font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'>Claire Fitz-Gerald<o:p></o:p></span></p><p class=MsoNormal><i><span style='font-size:10.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#1F497D'><o:p> </o:p></span></i></p><p class=MsoNormal><b><span style='font-size:11.0pt;font-family:"Franklin Gothic Demi","sans-serif";color:#002776'>Cape Cod Commercial Fishermen's Alliance<o:p></o:p></span></b></p><p class=MsoNormal><b><span style='font-size:11.0pt;font-family:"Franklin Gothic Book","sans-serif";color:#DE3500'>~ Small Boats. Big Ideas. ~</span></b><b><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#DE3500'><o:p></o:p></span></b></p></div><p class=MsoNormal><span style='font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D'><o:p> </o:p></span></p><div><div style='border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in'><p class=MsoNormal><b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'>From:</span></b><span style='font-size:10.0pt;font-family:"Tahoma","sans-serif"'> NEFS V [mailto:nefsector5@gmail.com] <br><b>Sent:</b> Monday, September 01, 2014 8:46 PM<br><b>To:</b> mike walsh - 6; NEFS 11 & 12 - Josh Wiersma; NEFS 13 John Haran; NEFS 2 - Dave Leveille; NEFS 3 - Rob Banks; NEFS 6 & 10 Jim Reardon; NEFS 7 & 8 - Linda MaCann; NEFS 9 - Stephanie Rafael-DeMello; paula lynch - 10; Claire Fitz-Gerald; Sector - MCCS; Sector - NCCS; Sector - Sustainable Harvest; tory bramante- 6<br><b>Subject:</b> NEFS 5 Available Fish<o:p></o:p></span></p></div></div><p class=MsoNormal><o:p> </o:p></p><div><p class=MsoNormal>All,<br>NEFS 5 has the following fish available for lease/trade:<o:p></o:p></p></div><div><ul type=disc><li class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><strong><span style='font-size:13.5pt'>GB EAST cod: 954 lbs @ $0.83</span></strong><o:p></o:p></li><li class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><strong><span style='font-size:13.5pt'>GB EAST cod: 1,046 lbs to trade for 1,830 lbs GB WEST cod</span></strong><o:p></o:p></li><li class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><strong><span style='font-size:13.5pt'>GB blackback: 30,000 lbs @ $0.07</span></strong><o:p></o:p></li><li class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><strong><span style='font-size:13.5pt'>GOM blackback: 800 lbs @ $0.03</span></strong><o:p></o:p></li><li class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><strong><span style='font-size:13.5pt'>white hake: 6,322 lbs @ $0.13</span></strong><o:p></o:p></li><li class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><strong><span style='font-size:13.5pt'>pollock: 22,000 lbs @ $0.015</span></strong><o:p></o:p></li><li class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><strong><span style='font-size:13.5pt'>redfish: 14,000 lbs @ $0.015</span></strong><o:p></o:p></li><li class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><strong><span style='font-size:13.5pt'>GB yt: 1,873 lbs @ $1.13</span></strong><o:p></o:p></li><li class=MsoNormal style='mso-margin-top-alt:auto;mso-margin-bottom-alt:auto;mso-list:l0 level1 lfo1'><strong><span style='font-size:13.5pt'>GB yt: 5,127 lbs to trade for 10,254 lbs SNE yt</span></strong><o:p></o:p></li></ul><div><p class=MsoNormal> <o:p></o:p></p></div><div><p class=MsoNormal>-- <o:p></o:p></p></div><div><p class=MsoNormal> <o:p></o:p></p></div></div><div><p class=MsoNormal>Daniel Salerno, NEFS 5<o:p></o:p></p></div><div><p class=MsoNormal>C/O NESTCo.<o:p></o:p></p></div><div><p class=MsoNormal>55 State Street<o:p></o:p></p></div><div><p class=MsoNormal>Narragansett, RI 02882<o:p></o:p></p></div><div><p class=MsoNormal>401-932-0070<o:p></o:p></p></div><div><p class=MsoNormal>401-633-6539 (fax)<o:p></o:p></p></div><div><p class=MsoNormal><a href="mailto:nefsector5@gmail.com" target="_blank">nefsector5@gmail.com</a><o:p></o:p></p></div><div class=MsoNormal align=center style='text-align:center'></body></html>
</body>
</html>
答案 0 :(得分:1)
我会这样做:
with open(file_path, 'r') as f:
while 1:
line=f.readline()
if not line:
break
if "trade" in line.lower():
tags=line.replace('>','<').split('<')
for tag in tags:
if "trade" in tag.lower():
print("TRADE STUFF: ",tag.strip())
答案 1 :(得分:0)
切换'for'循环和'if'语句,如下所示:
for line in email:
if match:
print("TRADE STUFF: ", line)