Question

我想在html文档中获取如下所示的每个标记（<>内的每个标记代码，包括）。我尝试过使用/<.+>/，但似乎没有效果。

<table class="body wrap" cellpadding="0" cellspacing="0" align="center" style="width: 100%;max-width: 600px;background-color: #f4f4f4;">

我该怎么做？

Answer 1

这应该有用。

import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class HTMLTagMatcher
{
  private static final String REGEX = "<[^\\/][^<>]+>";
  private static final String INPUT = "<test><blah /><test2></test><best><blargh></best><outside>";

  public static void main(String[] args) {
    Pattern p = Pattern.compile(REGEX);
    Matcher match = p.matcher(INPUT);
    while (match.find()) {
      System.out.println(match.group());
    }
  }
}

使用正则表达式获取html中的标记代码

1 个答案: