在嵌套大括号内提取数据

时间:2014-07-30 06:17:31

标签: java regex

我想分别在第一个嵌套大括号和第二个嵌套大括号之间提取内容。现在我完全陷入困境,任何人都可以帮助我。我的文件read.txt包含以下数据。我只是把它读成字符串“s”。

  BufferedReader br=new BufferedReader(new FileReader("read.txt"));
    while(br.ready())
    {
        String s=br.readLine();
        System.out.println(s);

    }

输出

{ { "John", "ran" },                { "NOUN", "VERB" } },
{ { "The", "dog", "jumped"},        { "DET", "NOUN", "VERB" } },
{ {  "Mike","lives","in","Poland"}, {"NOUN","VERB","DET","NOUN"} },

即我的输出应该看起来像

  "John", "ran"    
  "NOUN", "VERB" 
  "The", "dog", "jumped"  
  "DET", "NOUN", "VERB" 
  "Mike","lives","in","Poland" 
  "NOUN","VERB","DET","NOUN"

4 个答案:

答案 0 :(得分:7)

使用此正则表达式:

(?<=\{)(?!\s*\{)[^{}]+

查看Regex Demo中的匹配项。

在Java中:

Pattern regex = Pattern.compile("(?<=\\{)(?!\\s*\\{)[^{}]+");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    // matched text: regexMatcher.group()
}

<强>解释

  • lookbehind (?<=\{)断言当前位置之前的是{
  • 否定前瞻(?!\s*\{)断言后面的内容不是可选空格,而是{
  • [^{}]+匹配任何非curlies的字符

答案 1 :(得分:3)

如果您拆分为“}”,那么您可以在一个字符串中获取单词集,然后只需更换花括号

根据您的代码

BufferedReader br=new BufferedReader(new FileReader("read.txt"));
while(br.ready())
{
    String s=br.readLine();
    String [] words = s.split ("},");

    for (int x = 0; x < words.length; x++) {
        String printme = words[x].replace("{", "").replace("}", "");
    }

}

答案 2 :(得分:1)

您可以随时删除左括号,然后按&#39;}拆分,&#39;这会留给你所要求的字符串列表。 (如果那是一个字符串,当然)

String s = input.replace("{","");
String[] splitString = s.split("},");

首先删除开括号:

"John", "ran" },                "NOUN", "VERB" } },
"The", "dog", "jumped"},        "DET", "NOUN", "VERB" } },
"Mike","lives","in","Poland"},"NOUN","VERB","DET","NOUN"} },

然后将分开},

"John", "ran"
"NOUN", "VERB" }
"The", "dog", "jumped"
"DET", "NOUN", "VERB" }
"Mike","lives","in","Poland"
"NOUN","VERB","DET","NOUN"}

然后你只需要用另一个替换来整理它们!

答案 3 :(得分:1)

另一种方法可能是搜索没有内部{...}{个字符的}子字符串,只选择其内部部分而不使用{}。< / p>

描述此子字符串的正则表达式可能看起来像

\\{(?<content>[^{}]+)\\}

说明:

  • \\{已转义{所以现在它代表{字面值(通常它代表量词{x,y}的开头,因此需要转义)
  • (?<content>...)是命名捕获组,它只存储{}之间的部分,之后我们就可以使用此部分(而不是整个匹配,也包括{ }
  • [^{}]+代表一个或多个非{ }字符
  • \\}转发},表示它代表}

样本:

String input = "{ { \"John\", \"ran\" },                { \"NOUN\", \"VERB\" } },\r\n" + 
        "{ { \"The\", \"dog\", \"jumped\"},        { \"DET\", \"NOUN\", \"VERB\" } },\r\n" + 
        "{ {  \"Mike\",\"lives\",\"in\",\"Poland\"}, {\"NOUN\",\"VERB\",\"DET\",\"NOUN\"} },";

Pattern p = Pattern.compile("\\{(?<content>[^{}]+)\\}");
Matcher m = p.matcher(input);
while(m.find()){
    System.out.println(m.group("content").trim());
}

输出:

"John", "ran"
"NOUN", "VERB"
"The", "dog", "jumped"
"DET", "NOUN", "VERB"
"Mike","lives","in","Poland"
"NOUN","VERB","DET","NOUN"