解析嵌套括号之间的String元素

时间:2014-12-04 01:34:01

标签: java

我试图编写一个从嵌套括号中提取信息的小程序。例如,如果我给出了字符串:

"content (content1 (content2, content3) content4 (content5 (content6, content7))"

我希望将其返回(在ArrayList或其他Collection中):

["content", "content1", "content2, content3", "content4", "content5", "content6, content7"]

是否有任何现有的库或算法可以帮助我解决这个问题。

提前致谢!

修改

感谢您的建议,但是content2和content3应该保存在最终列表中的相同字符串中,因为它们位于同一组括号中。

3 个答案:

答案 0 :(得分:2)

这似乎符合上面给出的一个例子:

import java.util.ArrayList; 

public class ParseParenthesizedString {
    public enum States { STARTING, TOKEN, BETWEEN }
    public static void main(String[] args)
    {
        ParseParenthesizedString theApp = new ParseParenthesizedString();
        theApp.Answer();
    }

    public void Answer()
    {
        String theString = 
           "content (content1 (content2, content3) content4 (content5 (content6, content7))";
        // wants:
        // ["content", "content1", "content2, content3", "content4", "content5", "content6, content7"]
        States state = States.STARTING;
        ArrayList<String> theStrings = new ArrayList<String>();
        StringBuffer temp = new StringBuffer("");

        for (int i = 0; i < theString.length() ; i++)
        {
            char cTemp = theString.charAt(i);
            switch (cTemp)
            {
                case '(':
                {
                    if (state == States.STARTING)  state = States.BETWEEN;
                    else if (state == States.BETWEEN)  {} 
                    else if (state == States.TOKEN )
                    {
                        state = States.BETWEEN;
                        theStrings.add(temp.toString().trim());
                        temp.delete(0,temp.length());
                    }
                    break;
                }
                case ')':
                {
                    if (state == States.STARTING) 
                    {  /* this is an error */ }
                    else if (state == States.TOKEN) 
                    {
                        theStrings.add(temp.toString().trim());
                        temp.delete(0,temp.length());
                        state = States.BETWEEN;
                    } 
                    else if (state == States.BETWEEN ) {}
                    break;
                }
                default:
                {
                    state = States.TOKEN;
                    temp.append(cTemp);
                }
            }
        }

        PrintArrayList(theStrings);
    }
    public static void PrintArrayList(ArrayList<String> theList)
    {    
        System.out.println("The ArrayList with " 
                + theList.size() + " elements:");
        for (int i = 0; i < theList.size(); i++)
        {
            System.out.println(i + ":" + theList.get(i));
        }
    }
}

输出:

The ArrayList with 6 elements:
0:content
1:content1
2:content2, content3
3:content4
4:content5
5:content6, content7

答案 1 :(得分:0)

Java的String.split()将为您完成这项工作。它需要一个正则表达式来定义每个标记之间的分隔符...对于你来说,你的分隔符似乎是圆括号或逗号,可选择用两边的空格包围。所以这应该可以解决问题:

String[] result = s.split("\\s*[\\(\\),]+\\s*");

答案 2 :(得分:-1)

如果括号对您来说不重要(意味着结果不依赖于包围),那么String.split可能会使用简单的正则表达式:

String[] result = input.split("[ ,()]+");