我有此代码实现ParserCallback
并将HTML
封电子邮件转换为Plain
文本。当我解析像这个=
"DO NOT REPLY TO THIS EMAIL MESSAGE. <br>---------------------------------------<br>\n" +
"nix<br>---------------------------------------<br> Esfghjdfkj\n" +
"</blockquote></div><br><br clear=\"all\"><div><br></div>-- <br><div dir=\"ltr\"><b>Regards <br>Nisj<br>Software Engineer<br></b><div><b>Bingo</b></div></div>\n" +
"</div>"
但是当我解析这种电子邮件正文时,它会返回null,
email = "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html charset=us-ascii\"></head><body style=\"word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;\">Got it...so pls send to customer now.<div><br><div style=\"\"><div>On Nov 8, 2013, at 12:31 PM, <a href=\"mailto:xxxxxxx.com\">xxxxxxx.com</a> wrote:</div><br class=\"Apple-interchange-newline\"><blockquote type=\"cite\">Forwarding test.<br>---------------------------------------<br> ABCD.</blockquote></div><br></div></body></html>";
代码:
import java.io.IOException;
import java.io.StringReader;
import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML.Attribute;
import javax.swing.text.html.HTML.Tag;
import javax.swing.text.html.HTMLEditorKit.Parser;
import javax.swing.text.html.HTMLEditorKit.ParserCallback;
import javax.swing.text.html.parser.ParserDelegator;
public class EmailBody {
public static void main(String[] args) throws IOException
{
String email = "";
class EmailCallback extends ParserCallback
{
private String body_;
private boolean divStarted_;
public String getBody()
{
return body_;
}
@Override
public void handleStartTag(Tag t, MutableAttributeSet a, int pos)
{
if (t.equals(Tag.DIV) && "ltr".equals(a.getAttribute(Attribute.DIR)))
{
divStarted_ = true;
}
}
@Override
public void handleEndTag(Tag t, int pos)
{
if (t.equals(Tag.DIV))
{
divStarted_ = false;
}
}
@Override
public void handleText(char[] data, int pos)
{
if (divStarted_)
{
body_ = new String(data);
}
}
}
EmailCallback callback = new EmailCallback();
Parser parser = new ParserDelegator();
StringReader reader = new StringReader(email);
parser.parse(reader, callback, true);
reader.close();
System.out.println(callback.getBody());
}
}
你能说出原因,为什么会发生这种情况?
答案 0 :(得分:1)
您的代码只会从DIV
元素中获取元素文本,这些元素的dir
属性值为ltr
。如果handleText
标志为true,divStarted_
方法将仅处理元素文本,仅当handleStartTag
将此标志设置为true时才会发生。
在第一个电子邮件示例中,您有这样的元素,在第二个电子邮件示例中,您没有这些元素。