顺序v / s使用正则表达式进行嵌套循环结构解析

时间:2012-01-31 09:42:27

标签: java regex quantifiers

输入可以是1.或2.或两者的组合。

  1. 顺序
  2.     ...
        startLoop
          setSomething
        endLoop
    
        startLoop
          setSomething
        endLoop
        ...
    

    我使用的正则表达式是(startLoop。+?endLoop)+?将每个循环块作为我的匹配器组。这适用于我每次访问setSomething并改变它的顺序情况。

    1. 嵌套
    2.     ...
          startLoop
            setSomething1.1
            startLoop
              setSomething2.1
              startLoop
                setSomething3
              endLoop
              setSomething2.2
            endLoop
            setSomething1.2
          endLoop
          ...
      

      我写了一些像(startLoop。+?startLoop)+?但这只能让我访问setSomething1.1

      无论输入具有什么类型的循环结构,我都无法提供允许我访问setSomething的正则表达式。

      感谢您的帮助。

2 个答案:

答案 0 :(得分:3)

我认为在正则表达式的帮助下捕捉你所描述的内容是不可能的。正则表达式只能捕获常规语言,而您为嵌套循环情况描述的内容与无上下文语言非常相似。根据{{​​3}},常规语言构成了无上下文语言的严格子集,因此无法捕获所有无上下文的语言。

  

CFGs与正则表达式

     

无上下文语法比正则表达式更强大   

  • 任何可以使用正则表达式生成的语言都可以通过无上下文语法生成   
  • 有些语言可以通过任何正则表达式无法生成的无上下文语法生成。

  • 参考:Chomsky hierarchy

    答案 1 :(得分:0)

    试过这个,工作过。  这是一种荒谬的做法,但现在有效。

    private static String normalize(String input) {
        //Final string is held here
        StringBuilder markerString = new StringBuilder(input);
        //Look for the occurrences of startLoop-endLoop structures across lines
        Pattern p1 = Pattern.compile("(startLoop.+?\\endLoop)+?",Pattern.DOTALL);
        Matcher m1 = p1.matcher(markerString.toString());
        while(m1.find()){
            /* startLoop-endLoop structure found
             * Make sure length of StringBuilder remains same
             */
            markerString.setLength(input.length());
            //group will now contain the matched subsequence of the full string
            StringBuilder group = new StringBuilder(m1.group());
            /* Look for occurrences of startLoop within the matched group
             * and maintain a counter for the no of occurrences 
             */
            Pattern p2 = Pattern.compile("(startLoop)+?",Pattern.DOTALL);
            Matcher m2 = p2.matcher(group.toString());
            int loopCounter = 0;
            while(m2.find()){
                loopCounter++;
            }
            /* this takes care of the sequential loops scenario as well as matched group
             * in nested loop scenario
             */
            markerString.replace(m1.start(), m1.end(), m1.group().
                             replaceAll("setSomething", "setThisthing"));
            /* For the no of times that startLoop occurred in the matched group,
             * do the following
             * 1. Find the next index of endLoop after the matched group's end in the full string
             * 2. Read the subsequence between matched group's end and endIndex
             * 3. Replace all setSomething with setThisthing in the subsequence
             * 4. Replace subsequence in markerString
             * 5. Decrement forCounter
             */
            int previousEndIndex = m1.end();
            int currentEndIndex = -1;
            while(loopCounter>1){
                currentEndIndex = markerString.indexOf("endLoop",previousEndIndex);
                String replacerString  = markerString.substring(previousEndIndex,currentEndIndex);
                replacerString =  replacerString.replaceAll("setSomething", "setThisThing");
                markerString.replace(previousEndIndex, currentEndIndex, replacerString);
                previousEndIndex = currentEndIndex+7;
                loopCounter--;
            }
        }
        input = markerString.toString();
    }