删除XML重复标记

时间:2013-06-13 20:00:02

标签: java xml tags

我正在收到一些像这样的xml:

<cite id="0ac50429-bfbd-74e5-81bf-be29583cba3b">
<cite id="0ac50429-bfbd-74e5-81bf-be2a36aec2df">
<cite id="0ac50429-bfbd-74e5-81bf-be3d125bdc1c">Some Text
</cite>
</cite>
</cite>
<p>random text</p>
<cite id="0ac50429-bfbd-74e5-81bf-be29583cba3b">
<cite id="0ac50429-bfbd-74e5-81bf-be2a36aec2df">
<cite id="0ac50429-bfbd-74e5-81bf-be3d125bdc1c">
More text
</cite>
</cite>
</cite>

如您所见,我有相同值的多个标签,每个文本只需要1个标签:

<cite id="0ac50429-bfbd-74e5-81bf-be3d125bdc1c">Some Text</cite>
<p>random text</p>
<cite id="0ac50429-bfbd-74e5-81bf-be29583cba3b">More text</cite>

但我找不到摆脱这个的好方法。有人有线索吗?我试图让最后一个孩子,但我无法得到它们。我试过使用正则表达式,我可以获得最后一个节点,但我无法正确替换它们,以获得所需的xml。 TY!

这是我的解决方案(我不能回答我自己的问题,所以我在这里写一下:)

我知道这不是最好的,可以做得更好,它有效。

private static String replaceNodes(String simpleRegex, String xml)
{

    String tagMultiple;
    String expresionRegular = "("+simpleRegex+")+";

    Pattern pattern = Pattern.compile(expresionRegular);
    Matcher matcher = pattern.matcher(xml);


    while(matcher.find())  // Here we look for all the nodes that are repeated . EJ  <cite id="asda"><cite id="asda"><cite id="asda">
    {
         Pattern patternSimple = Pattern.compile(simpleRegex);
        Matcher matcherSimple = patternSimple.matcher(xml);
        String tagUnicoEnd ="";
        if (matcherSimple.find()) //Here we get only one node. <cite id="asda">
            tagUnicoEnd = matcher.group(1);         

        tagMultiple = matcher.group();                  
        xml =xml.replace(tagMultiple,tagUnicoEnd);  //we replace all the repetead nodes, with the unique one.
    }       

    return xml;                         
}

1 个答案:

答案 0 :(得分:0)

最后我找到了一种方法,我知道它不是最好的,可以做得更好

private static String replaceNodes(String simpleRegex, String xml)
{

    String tagMultiple;
    String expresionRegular = "("+simpleRegex+")+";

    Pattern pattern = Pattern.compile(expresionRegular);
    Matcher matcher = pattern.matcher(xml);


    while(matcher.find())  // Here we look for all the nodes that are repeated . EJ  <cite id="asda"><cite id="asda"><cite id="asda">
    {
         Pattern patternSimple = Pattern.compile(simpleRegex);
        Matcher matcherSimple = patternSimple.matcher(xml);
        String tagUnicoEnd ="";
        if (matcherSimple.find()) //Here we get only one node. <cite id="asda">
            tagUnicoEnd = matcher.group(1);         

        tagMultiple = matcher.group();                  
        xml =xml.replace(tagMultiple,tagUnicoEnd);  //we replace all the repetead nodes, with the unique one.
    }       

    return xml;                         
}