编写正则表达式模式以便于理解/维护?

时间:2018-12-07 02:16:01

标签: java regex

这样的正则表达式模式:

".*/.*/.*/.*/.*/.*/(.*)-\d{2}\.\d{2}\.\d{2}.\d{4}.*"

真的很难维护。

我想知道,那里有什么东西

".*<userName>/.*<envName>/.*<serviceName>/.*<dataType>/.*<date>/.*<host>/(.*)-\d{2}\.\d{2}\.\d{2}.\d{4}.*<fileName>"

以帮助您更轻松地阅读/理解正则表达式?

更新于2018-12-07

感谢@Liinux的帮助,它称为https://aws.amazon.com/blogs/compute/using-amazon-api-gateway-as-a-proxy-for-dynamodb/,一个简单的Java演示为:

public static void main(String[] args) {
    String re = "(?x)"
            + "# (?x) is the free-spacing flag\n"
            + "#anything here between the first and last will be ignored\n"
            + "#in free-spacing mode, whitespace between regular expression tokens is ignored\n"
            + "(19|20\\d\\d)       # year (group 1)\n"
            + "[-/\\.]             # separator\n"
            + "(\\d{2})            # month (group 2)\n"
            + "[-/\\.]             # separator\n"
            + "(\\d{2})            # day (group 3)";
    Pattern pattern = Pattern.compile(re);
    Stream.of("2018-12-07", "2018.12.07", "2018/12/07").forEach(aTest -> {
        System.out.println("**************** Testing: " + aTest);
        final Matcher matcher = pattern.matcher(aTest);
        if (matcher.find()) {
            for (int i = 1; i <= matcher.groupCount(); i++) {
                System.out.println("Group - " + i + ": " + matcher.group(i));
            }
        }
    });
}

2 个答案:

答案 0 :(得分:2)

如果您使用的是Perl,则只需启用/x标志并将空白和注释放在正则表达式中即可:

qr{
    .*  # userName
    /
    .*  # envName
    /
    .*  # serviceName
    /
    .*  # dataType
    /
    .*  # date
    /
    .*  # host
    /
    (.*)-\d{2}\.\d{2}\.\d{2}.\d{4}.*  # fileName
}x

也就是说,如果您的意思是(所有非斜杠字符),那么所有.*应该都应该是[^/]*

您还可以从名称合理的变量中构建模式:

my $userName =
my $envName =
my $serviceName =
my $dataType =
my $date =
my $host = qr{[^/]*};

my $fileName = qr{(.*)-\d{2}\.\d{2}\.\d{2}.\d{4}.*};

...
qr{$userName/$envName/$serviceName/$dataType/$date/$host/$fileName}

答案 1 :(得分:1)

如果您的语言支持,则可以使用free-spacing在正则表达式中添加注释。在自由行距模式下,空格将被忽略(使用小括号),并且您可以使用#符号添加注释。

教程示例

# Match a 20th or 21st century date in yyyy-mm-dd format
(19|20)\d\d                # year (group 1)
[- /.]                     # separator
(0[1-9]|1[012])            # month (group 2)
[- /.]                     # separator
(0[1-9]|[12][0-9]|3[01])   # day (group 3)