尝试解析正则表达式中的深层嵌套组时引起的StackoverflowException

时间:2019-06-29 22:25:50

标签: java regex recursion stack-overflow tail-recursion

我正在尝试为正则表达式编写一个解析器,我确信该解析器可以处理几乎所有抛出的异常(例如,它可以正确解析正则表达式的这种怪异:http://www.ex-parrot.com/~pdw/Mail-RFC822-Address.html,没有问题。) 。 但是,我的解析器在处理类似以下内容时确实抛出StackoverflowException(请注意,我知道这个特定的regex不执行任何操作,但这是最简单的示例),我该如何重写代码以删除递归?:

public static String generateOverflowingRegex()
{
    StringBuilder sb = new StringBuilder();
    IntStream.range(0, 1000).forEach(i -> sb.append("("));
    sb.append("[a-z]+");
    IntStream.range(0, 1000).forEach(i -> sb.append(")"));
    return sb.toString();
}

我研究了递归蹦床,但是,我认为这仅适用于尾递归,不适用于我的任何递归方法。 我还考虑过要在堆上转换为Stack,但是,按照我所认为的问题,这导致代码的可读性不存在。

下面仅显示完整代码段,#tryParse()是主要入口点。

就将#setupParseState()展开为尝试解析而言,这相当容易,但是在进入#doParse()时会变得更加复杂,因为它会调用诸如#isOr()之类的方法,这些方法会回调到#setupParseState()

    public State tryParse(String pattern) throws InterruptedException
    {
        State state = new State();
        state.setSource(pattern);
        setupParseState("or", state);
        return state;
    }

    protected boolean setupParseState(String typeToCheck, State state) throws InterruptedException
    {
        //init new parse state
        state.openNode(typeToCheck);
        state.getCurrentNode().setStartIndex(state.getIndex());
        if (doParse(typeToCheck, state))
        {
            state.getCurrentNode().setEndIndex(state.getIndex());
            //set current node as complete set parent node as current node and return to parent state.
            state.closeNode();
            return true;
        }
        //parsing failed so drop current state and return to previous state
        state.discardNode();
        return false;
    }

    protected boolean doParse(String name, State state) throws InterruptedException
    {
        int tempIndex = state.getIndex();
        boolean sucess = false;
        switch (name)
        {
            case "or":
                if (isOr(state))
                {
                    sucess = true;
                    break;
                }
                break;
            case "sequence":
                if (iSequence(state))
                {
                    sucess = true;
                    break;
                }
                break;
            case "term":
                if (isTerm(state))
                {
                    sucess = true;
                    break;
                }
                break;
        }
        if (!sucess)
        {
            state.index = tempIndex;
        }
        return sucess;
    }

    protected boolean isOr(State state) throws InterruptedException
    {
        state.setNodeValue("OR: match either of the following");
        while (true)
        {
            if (!setupParseState("sequence", state))
            {
                return false;
            }
            Character temp = state.getCurrentState();
            if (temp != null && temp.charValue() == '|') state.advanceState();
            else break;
        }
        return true;
    }

    protected boolean iSequence(State state) throws InterruptedException
    {
        state.setNodeValue("Sequence: match all of the following in order");
        boolean isEmpty = true;
        while (true)
        {
            if (!setupParseState("term", state)) break;
            isEmpty = false;
        }
        if (!isEmpty)
        {
            return true;
        }
        if (state.getCurrentState() == null || state.getCurrentState().charValue() == '|' || state.getCurrentState().charValue() == ')')
        {
            state.setNodeValue("Empty");
            return true;
        }
        return false;
    }

    protected boolean isTerm(State state) throws InterruptedException
    {
        if (setupParseState("assertion", state))
        {
            return true;
        }
        if (setupParseState("quantatom", state))
        {
            return true;
        }
        return false;
    }

0 个答案:

没有答案