Question

我有一个字符串，我们称之为output，它等于以下内容：

ltm data-group internal str_testclass { 
    records { 
        baz { 
            data "value 1" 
        } 
        foobar { 
            data "value 2" 
        }
        topaz {}
    } 
    type string 
}

我正在尝试提取给定“记录”名称的引号之间的子字符串。所以给定foobar我要提取value 2。我要提取的子字符串将始终以我在上面指定的形式出现，在“记录”名称，空格，开括号，新行，空格，字符串data之后，然后是子字符串I想要捕获是从那里引用之间的。唯一的例外是当没有值时，总会发生这种情况，就像我上面用topaz所规定的那样，在这种情况下，在“记录”名称之后会有一个开放和封闭的括号，我只是喜欢得到一个空字符串。我怎么能写一行Java来捕获它？到目前为止我有......

String myValue = output.replaceAll("(?:foobar\\s{\n\\s*data "([^\"]*)|()})","$1 $2");

但我不确定从哪里开始。

Answer 1

让我们开始提取＆＃34;记录＆＃34;具有以下正则表达式ltm\s+data-group\s+internal\s+str_testclass\s*\{\s*records\s*\{\s*(?<records>([^\s}]+\s*\{\s*(data\s*"[^"]*")?\s*\}\s*)*)\}\s*type\s*string\s*\}

的结构

然后来自＆＃34;记录＆＃34;只有find与[^\s}]+\s*\{\s*(?:data\s*"(?<data>[^"]*)")?\s*\}\s*进行过度匹配。＆＃34;数据＆＃34;群组包含您正在寻找的内容，并且在＆＃34; topaz＆＃34;中为空情况下。

Java字符串：

"ltm\\s+data-group\\s+internal\\s+str_testclass\\s*\\{\\s*records\\s*\\{\\s*(?<records>([^\\s}]+\\s*\\{\\s*(data\\s*\"[^\"]*\")?\\s*\\}\\s*)*)\\}\\s*type\\s*string\\s*\\}"
"[^\\s}]+\\s*\\{\\s*(?:data\\s*\"(?<data>[^\"]*)\")?\\s*\\}\\s*"

演示：

String input = 
    "ltm data-group internal str_testclass {\n" + 
    "  records {\n" +
    "      baz {\n" + 
    "          data \"value 1\"\n" + 
    "      }\n" +
    "      foobar {\n" + 
    "          data \"value 2\"\n" + 
    "      }\n" +
    "      topaz {}\n" +
    "      empty { data \"\"}\n" +
    "    }\n" +
    "    type string\n" + 
    "}";

Pattern language = Pattern.compile("ltm\\s+data-group\\s+internal\\s+str_testclass\\s*\\{\\s*records\\s*\\{\\s*(?<records>([^\\s}]+\\s*\\{\\s*(data\\s*\"[^\"]*\")?\\s*\\}\\s*)*)\\}\\s*type\\s*string\\s*\\}");
Pattern record   = Pattern.compile("(?<name>[^\\s}]+)\\s*\\{\\s*(?:data\\s*\"(?<data>[^\"]*)\")?\\s*\\}\\s*");

Matcher lgMatcher = language.matcher(input);
if (lgMatcher.matches()) {
  String records = lgMatcher.group();
  Matcher rdMatcher = record.matcher(records);
  while (rdMatcher.find()) {
    System.out.printf("%s:%s%n", rdMatcher.group("name"), rdMatcher.group("data"));
  }
} else {
  System.err.println("Language not recognized");
}

输出：

baz:value 1
foobar:value 2
topaz:null
empty:

Alernatives：在解析自定义语言时，您可以尝试编写ANTLR语法或创建Groovy DSL。

Answer 2

你的正则表达式甚至不应该编译，因为你没有转义你的正则表达式字符串中的"，所以它正在你的正则表达式中的第一个"结束你的字符串。

相反，试试这个正则表达式：

String regex = key + "\\s\\{\\s*\\n\\s*data\\s*\"([^\"]*)\"";

您可以查看其工作原理here on regex101。

尝试使用此getRecord()方法，其中key是您要搜索的记录“名称”，例如foobar，输入是您要搜索的字符串。

public static void main(String[] args) {
    String input = "ltm data-group internal str_testclass { \n" +
            "    records { \n" +
            "        baz { \n" +
            "            data \"value 1\" \n" +
            "        } \n" +
            "        foobar { \n" +
            "            data \"value 2\" \n" +
            "        }\n" +
            "        topaz {}\n" +
            "    } \n" +
            "    type string \n" +
            "}";

    String bazValue = getRecord("baz", input);
    String foobarValue = getRecord("foobar", input);
    String topazValue = getRecord("topaz", input);

    System.out.println("Record data value for 'baz' is '" + bazValue + "'");
    System.out.println("Record data value for 'foobar' is '" + foobarValue + "'");
    System.out.println("Record data value for 'topaz' is '" + topazValue + "'");
}

private static String getRecord(String key, String input) {
    String regex = key + "\\s\\{\\s*\\n\\s*data\\s*\"([^\"]*)\"";
    final Pattern pattern = Pattern.compile(regex);
    Matcher matcher = pattern.matcher(input);
    if (matcher.find()) {
        //if we find a record with data return it
        return matcher.group(1);
    } else {
        //else see if the key exists with empty {}
        final Pattern keyPattern = Pattern.compile(key);
        Matcher keyMatcher = keyPattern.matcher(input);
        if (keyMatcher.find()) {
            //return empty string if key exists with empty {}
            return "";
        } else {
            //else handle error, throw exception, etc.
            System.err.println("Record not found for key: " + key);
            throw new RuntimeException("Record not found for key: " + key);
        }
    }
}

输出：

“baz”的记录数据值为“值1” 'foobar'的记录数据值是'值2'
'topaz'的记录数据值是''

Answer 3

你可以尝试

(?:foobar\s{\s*data "(.*)")

Answer 4

我认为此处不需要replaceAll（）。这样的事情会起作用吗？

String var1 = "foobar";
String regex = '(?:' + var1 + '\s{\n\s*data "([^"]*)")';

然后，您可以将此作为正则表达式传递到模式和匹配器中以查找子字符串。

您可以将其简单地转换为函数，以便您可以为搜索字符串传递变量：

public static void SearchString(String str)
{
    String regex = '(?:' + str + '\s{\n\s*data "([^"]*)")';
}

从Java中的非捕获组中提取捕获组

4 个答案: