正则表达式从html中删除电子邮件地址

时间:2015-10-03 04:46:16

标签: java html regex

我有一个方法的html输入,我需要从中删除电子邮件地址。问题是电子邮件地址不在div内。它分为多个div。找到

下面的示例输入
div  class="p" id="p9" style="top:89.17999pt;left:430.7740pt;font-family:Times New Roman;font-size:1.0pt;">hello</div>
div class="p" id="p10" style="top:89.17999pt;left:484.100pt;font-family:Times New Roman;font-size:1.0pt;">.</div>
div class="p" id="p11" style="top:89.17999pt;left:487.100pt;font-family:Times New Roman;font-size:1.0pt;">p</div>
<div class="p" id="p1" style="top:89.17999pt;left:493.9300pt;font-family:Times New Roman;font-size:1.0pt;">@</div>
div class="p" id="p13" style="top:89.17999pt;left:0.09003pt;font-family:Times New Roman;font-size:1.0pt;">gmail</div>
div class="p" id="p" style="top:89.17999pt;left:33.18pt;font-family:Times New Roman;font-size:1.0pt;">.</div>
<div class="r" style="left:79.84pt;bottom:9.pt;width:479.98004pt;height:1.71997pt;background-color:#d9d9d9;">&nbsp;</div>
div class="p" id="p1" style="top:89.17999pt;left:3.18pt;font-family:Times New Roman;font-size:1.0pt;">com</div>"

我们正在使用的正则表达式是[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,6} 它只提供标准的电子邮件格式。任何帮助都将非常感激。

编辑:删除div开始标记,因为它已被页面解析为文本。

1 个答案:

答案 0 :(得分:0)

这对我有用。

public static void main(String[] args) {
    String text = "div  class=\"p\" id=\"p9\" style=\"top:89.17999pt;left:430.7740pt;font-family:Times New Roman;font-size:1.0pt;\">hello</div>\n"
            + "div class=\"p\" id=\"p10\" style=\"top:89.17999pt;left:484.100pt;font-family:Times New Roman;font-size:1.0pt;\">.</div>\n"
            + "div class=\"p\" id=\"p11\" style=\"top:89.17999pt;left:487.100pt;font-family:Times New Roman;font-size:1.0pt;\">p</div>\n"
            + "<div class=\"p\" id=\"p1\" style=\"top:89.17999pt;left:493.9300pt;font-family:Times New Roman;font-size:1.0pt;\">@</div>\n"
            + "div class=\"p\" id=\"p13\" style=\"top:89.17999pt;left:0.09003pt;font-family:Times New Roman;font-size:1.0pt;\">gmail</div>\n"
            + "div class=\"p\" id=\"p\" style=\"top:89.17999pt;left:33.18pt;font-family:Times New Roman;font-size:1.0pt;\">.</div>\n"
            + "<div class=\"r\" style=\"left:79.84pt;bottom:9.pt;width:479.98004pt;height:1.71997pt;background-color:#d9d9d9;\">&nbsp;</div>\n"
            + "div class=\"p\" id=\"p1\" style=\"top:89.17999pt;left:3.18pt;font-family:Times New Roman;font-size:1.0pt;\">com</div>\"";

    StringBuilder sb = new StringBuilder();
    String[] tokens = text.split("\n");

    Pattern p = Pattern.compile(".*>(.*)</div.*");

    for (String line : tokens) {
        Matcher m = p.matcher(line);
        if (m.matches()) {
            sb.append(m.group(1));
        }
    }

    System.out.println(sb.toString());
}

编辑:如果有更多的div只能匹配电子邮件的div,则可能需要调整模式。