我尝试实现java mail api来读取消息体,如果它包含内容,则将其存储到文本文件中。
我能够阅读邮件的正文,但它附带了一些html元素。
我已添加以下代码,我已使用过。
Properties props = System.getProperties();
props.setProperty("mail.store.protocol", "imaps");
Session session = Session.getDefaultInstance(props, null);
Store store = session.getStore("imaps");
store.connect("hostname", "username", "password");
String result = null;
Folder inbox = store.getFolder("Inbox");
inbox.open(Folder.READ_ONLY);
javax.mail.Message messages[]=inbox.search(new FlagTerm(new Flags(Flag.SEEN), false));
for(Message message:messages) {
System.out.println(Jsoup.parse(message).text());
}
如何在检索到的邮件中删除这些html元素?
请有人帮我解决这个问题。
答案 0 :(得分:1)
要删除邮件中的所有HTML标记,请使用jsoups text()
方法。
示例代码
String htmlString = "<div class=\"WordSection1\"> <p class=\"MsoNormal\">Hi<br> <br> <br> <br> Data is written in this mail.<br> <br> <br> <br> <o:p></o:p></p> </div>";
System.out.println(Jsoup.parse(htmlString).text());
<强>输出强>
Hi Data is written in this mail.
如果特定元素应该与呈现的HTML源类似的换行符,则可以在jsoups'{{3}时添加换行符然后avoid pretty printing }。
<强> prettyPrint 强>
如果禁用,HTML输出方法将不会重新格式化输出, 输出通常看起来像输入。
示例代码
String htmlString = "<div class=\"WordSection1\"> <p class=\"MsoNormal\">Hi<br> <br> <br> <br> Data is written in this mail.<br> <br> <br> <br> <o:p></o:p></p> </div>";
htmlString = htmlString.replaceAll("<br>", System.getProperty("line.separator") + "<br>"); // do replacements for all tags that should result in line-breaks
Document.OutputSettings settings = new OutputSettings();
settings.prettyPrint(false); // to keep line-breaks
String cleanedSource = Jsoup.clean(htmlString, "", Whitelist.none(), settings);
System.out.println(cleanedSource);
<强>输出强>
Hi
Data is written in this mail.
[... four more empty lines]