我正在尝试使用此正则表达式提取POP3有效的电子邮件内容。
我删除了<script>
... </script>
和<style>
... </style>
的全部内容。
我已将每个<br>
标记转换为“\ n”。
我删除了所有HTML标记,并使用正则表达式(如
)将所有电子邮件内容解压缩为字符串regex = "<[^>]*>";
(这将仅删除标记和属性而不是其值)。
我添加了额外的空格&amp;撰写邮件中的换行符
plz在浏览器的查看页面源模式下阅读此消息。所以你可以理解我现在需要的东西。
撰写邮件内容:
Testing white space:
hi hello then whats up man., is it cool
The policy set up by your network administrator requires that you authenticate yourself with this firewall before you can have access. To authenticate yourself click on the following link and enter your user name and password to log in to the firewall.
在pop3中检索到的邮件为:
<html><body><span style="font-family:Verdana; color:#000000; font-size:10pt;"><div><span style="font-family: Verdana; color: rgb(0, 0, 0); font-size: 10pt;"><span style="font-family: Verdana; color: rgb(0, 0, 0); font-size: 10pt;"><font style="font-family: Verdana;" color="#000000" size="2" face="Verdana"><font style="font-family: Verdana;" color="#000000" size="2" face="Verdana"><font style="font-family: Verdana;" color="#000000" size="2" face="Verdana"><font style="font-family: Verdana;" color="#000000" size="2" face="Verdana">Testing white space:<br>hi hello then whats up man., is it cool<br><br>The
policy set up by your network administrator requires that
you authenticate yourself with this firewall before you can have
access. To authenticate yourself click on
the following link and enter your user name and
password to log in to the firewall. </font></font></font></font></span></span></div></span></body></html>
来自HTML代码的格式化字符串:
Testing white space:
hi hello then whats up man., is it cool
The
policy set up by your network administrator requires that
you authenticate yourself with this firewall before you can have
access. To authenticate yourself click on
the following link and enter your user name and
password to log in to the firewall.
我将多余的空格移到单个空格中,如果连续有两个以上的换行符,我将用2个换行符替换它们。
在格式化的字符串中,我在“The”和“policy”之间有一个不需要的换行符。我无法预测它为什么会发生。我猜它附加了POP3。任何人都可以帮我格式化字符串吗?提前谢谢。
答案 0 :(得分:0)
在HTML中,换行符被视为空格。如果您使用单个空格替换检索邮件中的换行符,则在将<br>
转换为换行符之前,您应该得到预期的结果。