我有一个方法的html输入,我需要从中删除电子邮件地址。问题是电子邮件地址不在div内。它分为多个div。找到
下面的示例输入div class="p" id="p9" style="top:89.17999pt;left:430.7740pt;font-family:Times New Roman;font-size:1.0pt;">hello</div>
div class="p" id="p10" style="top:89.17999pt;left:484.100pt;font-family:Times New Roman;font-size:1.0pt;">.</div>
div class="p" id="p11" style="top:89.17999pt;left:487.100pt;font-family:Times New Roman;font-size:1.0pt;">p</div>
<div class="p" id="p1" style="top:89.17999pt;left:493.9300pt;font-family:Times New Roman;font-size:1.0pt;">@</div>
div class="p" id="p13" style="top:89.17999pt;left:0.09003pt;font-family:Times New Roman;font-size:1.0pt;">gmail</div>
div class="p" id="p" style="top:89.17999pt;left:33.18pt;font-family:Times New Roman;font-size:1.0pt;">.</div>
<div class="r" style="left:79.84pt;bottom:9.pt;width:479.98004pt;height:1.71997pt;background-color:#d9d9d9;"> </div>
div class="p" id="p1" style="top:89.17999pt;left:3.18pt;font-family:Times New Roman;font-size:1.0pt;">com</div>"
我们正在使用的正则表达式是[A-Z0-9._%+-]+@[A-Z0-9.-]+\\.[A-Z]{2,6}
它只提供标准的电子邮件格式。任何帮助都将非常感激。
编辑:删除div开始标记,因为它已被页面解析为文本。
答案 0 :(得分:0)
这对我有用。
public static void main(String[] args) {
String text = "div class=\"p\" id=\"p9\" style=\"top:89.17999pt;left:430.7740pt;font-family:Times New Roman;font-size:1.0pt;\">hello</div>\n"
+ "div class=\"p\" id=\"p10\" style=\"top:89.17999pt;left:484.100pt;font-family:Times New Roman;font-size:1.0pt;\">.</div>\n"
+ "div class=\"p\" id=\"p11\" style=\"top:89.17999pt;left:487.100pt;font-family:Times New Roman;font-size:1.0pt;\">p</div>\n"
+ "<div class=\"p\" id=\"p1\" style=\"top:89.17999pt;left:493.9300pt;font-family:Times New Roman;font-size:1.0pt;\">@</div>\n"
+ "div class=\"p\" id=\"p13\" style=\"top:89.17999pt;left:0.09003pt;font-family:Times New Roman;font-size:1.0pt;\">gmail</div>\n"
+ "div class=\"p\" id=\"p\" style=\"top:89.17999pt;left:33.18pt;font-family:Times New Roman;font-size:1.0pt;\">.</div>\n"
+ "<div class=\"r\" style=\"left:79.84pt;bottom:9.pt;width:479.98004pt;height:1.71997pt;background-color:#d9d9d9;\"> </div>\n"
+ "div class=\"p\" id=\"p1\" style=\"top:89.17999pt;left:3.18pt;font-family:Times New Roman;font-size:1.0pt;\">com</div>\"";
StringBuilder sb = new StringBuilder();
String[] tokens = text.split("\n");
Pattern p = Pattern.compile(".*>(.*)</div.*");
for (String line : tokens) {
Matcher m = p.matcher(line);
if (m.matches()) {
sb.append(m.group(1));
}
}
System.out.println(sb.toString());
}
编辑:如果有更多的div只能匹配电子邮件的div,则可能需要调整模式。