使用java正则表达式提取两个标签之间的数据,但不提取标签

时间:2012-01-22 04:20:30

标签: java regex

  

可能重复:
  RegEx match open tags except XHTML self-contained tags

 I have this list of 100 names to be extracted that lie in between the tags. I need to extract just the data and not the tags using Java Regular Expressions. 

例如:我需要数据 Aaron,Teb,Abacha,Jui,Abashidze,Harry 。一切都在新的一行。

    <a class="listing" href=http://eeee/a/hank_aaron/index.html">Aaron, Teb</a><br>
    <a class="listing" href=http://eeee/t/sani_abacha/index.html">Abacha, Jui</a><br>
    <a class="listing" href=http://eeee/i/aslan_abashidze/index.html">Abashidze, Harry</a><br>

我编写了以下代码,但它也提取了标签。我哪里错了。如何更换标签或Regexp是否错误。

public static void main(String[] args) throws Exception {
    URL oracle = new URL("http://eeee/all/people/index.html");
    BufferedReader in = new BufferedReader(new InputStreamReader(oracle.openStream()));
    String input;
    String REGEX = "<a class=\"listing\"[^>]*>";
    while ((input = in.readLine()) != null){
        Pattern p = Pattern.compile(REGEX);
        Matcher m = p.matcher(input);
        while(m.find()) {
            System.out.println(input);
        }
    }
    in.close();
}   

1 个答案:

答案 0 :(得分:0)

使用此正则表达式:

(?:<a class=\"listing\"[^>]*>)([^<]*)(?:<)

第1组将捕获名称。

P.S。您应该将Pattern p = Pattern.compile(REGEX);移到循环之外。