跳过括号内容的解析句的正则表达式

时间:2012-08-11 18:33:30

标签: java regex parsing

我需要一个句子解析器。解析器根据白色字符拆分完整句子。它将括号内的完整内容视为单个单词(解析后的单词)。

输入句子: -

  

“这是工作(我真正的工作),这很棒。”

需要输出: -

This 

is 

the 

work

(my real job)

which 

is 

great.

2 个答案:

答案 0 :(得分:2)

不确定是否有一种很好的方法可以使用这个正则表达式来解析像这样的句子中的单词。无论如何,您可能需要遍历句子。我认为String.split()不会为你做这件事。只需编写一个循环来为您执行此操作,然后您就可以处理parens不匹配时的细节。例如,即使句子结束并且没有右括号,这也会假设一切都是单词:

     String s = "This is the work (my real job) which is great, and (also some stuff";

     ArrayList<String> words = new ArrayList<String>();
     Scanner sentence = new Scanner(s);
     boolean inParen = false;
     StringBuilder inParenWord = new StringBuilder();
     while(sentence.hasNext()) {
        String word = sentence.next();
        if(inParen) {
           inParenWord.append(" ");
           inParenWord.append(word);

           if(word.endsWith(")")) {
              words.add(inParenWord.toString());
              inParenWord = new StringBuilder();
              inParen = false;
           }
        }
        else {
           if(word.startsWith("(")) {
              inParen = true;
              inParenWord.append(word);
           }
           else {
              words.add(word);
           }
        }
     }

     if(inParenWord.length()>0) {
        words.add(inParenWord.toString());
     }


     for(String word : words) {
        System.out.println(word);
     }

将输出:

This
is
the
work
(my real job)
which
is
great,
and
(also some stuff

或使用模式/匹配器:

     String s = "This is the work (my real job) which is great, and (also somet stuff";

     ArrayList<String> words = new ArrayList<String>();

     Pattern p = Pattern.compile(" ?([^(][^ ]+|\\([^\\)]+\\)?)");
     Matcher m = p.matcher(s);

     while(m.find()) {
        words.add(s.substring(m.start(),m.end()).trim());
     }

     for(String word : words) {
        System.out.println(word);
     }

答案 1 :(得分:0)

我相信你需要类似的东西(虽然我不确定这个正则表达式100%正常) 简单说;匹配(word-with-no-spaces) | (\(words-and-spaces-non-greedy\))

^[[(\w)]*|[(\(.+?)\)]*]*$