如何提取子字符串(html)和另一个子字符串(将用于正则表达式)并以正确的格式放置它们?

时间:2017-02-05 10:25:13

标签: java regex jsp

我有一个包含以下代码的巨型字符串,我需要以这样一种方式提取包含,如果有任何HTML附加它,并且如果包含以下模式的任何子字符串,则创建一个链接从它出来,并以适当的格式和地点继续。

示例:

<div id="contentPermission"> 
  [[MI44,MI304,MI409,MI45,MI264,MI108,MI46,MI47,MI48,MI49,MI50,MI51,MI52,MI58,MI530]]

</div>
<div>&nbsp;</div>

<p>&nbsp;</p>

<div>&nbsp;</div>

<p>&nbsp;</p>

<p>[[LP1137]]</p>

模式:开始&#34; [[&#34;并以&#34;]]&#34;结束 表格上方的表格:

[[anything between these brackets]]

所以外面应该是这样的:

<div id="contentPermission"> 

  <a href="index?page=content&id=MI44></a>

  <a href="index?page=content&id=MI304></a>

  <a href="index?page=content&id=MI409></a>

 ......

 ......
</div>
<div>&nbsp;</div>

<p>&nbsp;</p>

<div>&nbsp;</div>

<p>&nbsp;</p>

<p><a href="index?page=content&id=LP1137></a></p>

2 个答案:

答案 0 :(得分:1)

解决方案

public static void main(String[] args) {

    StringBuilder str = new StringBuilder("<div id=\"contentPermission\">"
            + "  [[MI44,MI304,MI409,MI45,MI264,MI108,MI46,MI47,MI48,MI49,MI50,MI51,MI52,MI58,MI530]]"
            + "</div><div>&nbsp;</div><p>&nbsp;</p><div>&nbsp;</div><p>&nbsp;</p><p>[[LP1137]]</p>");

    System.out.println("Before " + str.toString()+"\n\n\n");

    Pattern pattern = Pattern.compile("\\[{2}.[^\\]]*\\]{2}");
    Matcher matcher = pattern.matcher(str);

    while(matcher.find()){

        String codes = matcher.group(0);
        codes = codes.substring(2, codes.length()-2);

        StringBuilder urls = new StringBuilder();
        for(String code:codes.split(",")){
            urls.append("<a href=\"index?page=content&id=" + code + "></a>\n");
        }
        str = new StringBuilder(matcher.replaceFirst(urls.toString()));
        matcher = pattern.matcher(str);
    }

    System.out.println("Replaced " + str.toString());
}

答案 1 :(得分:1)

仅使用正则表达式的另一种解决方案(没有分割/循环或子字符串):

String content = "<div id=\"contentPermission\">[[MI44,MI304,MI409,MI45,MI264,MI108,MI46,MI47,MI48,MI49,MI50,MI51,MI52,MI58,MI530]]</div><div>&nbsp;</div><p>&nbsp;</p><div>&nbsp;</div><p>&nbsp;</p><p>[[LP1137]]</p>";

Pattern p = Pattern.compile("(?<=\\[\\[).*?(?=\\]\\])");
Matcher m = p.matcher(content);

while(m.find())
  content = content.replaceFirst("(\\[\\[).*?(\\]\\])", m.group().replaceAll("(\\w+)(,\\s*\\d*)*", "<a href=\"index?page=content&id=$1\"></a>"));