从字符串获取数组的正则表达式模式

时间:2017-05-18 08:18:08

标签: regex

我正在寻找能够解析字符串(url)的正则表达式:

/page/folder1/folder2/.../folderN/pagefile

[
    [1] => ['page', 'folder1', 'folder2', ..., 'folderN'],
    [2] => 'pagefile'
]

我无法了解子模式是否以及如何工作。 我只能使用正则表达式,没有任何进一步的代码。 有可能吗?

编辑1:我知道如何在没有正则表达式的情况下执行此操作。这不是问题。

编辑2:这个问题的答案应该有助于解决this

2 个答案:

答案 0 :(得分:2)

模式:

Match 1
    Full match  1-5 `page`
    Group 1.    1-5 `page`
Match 2
    Full match  6-13    `folder1`
    Group 1.    6-13    `folder1`
Match 3
    Full match  14-21   `folder2`
    Group 1.    14-21   `folder2`
Match 4
    Full match  22-25   `...`
    Group 1.    22-25   `...`
Match 5
    Full match  26-33   `folderN`
    Group 1.    26-33   `folderN`
Match 6
    Full match  34-42   `pagefile`
    Group 2.    34-42   `pagefile`

Demo/Explanation Link

输出/匹配:

{{1}}

这些是正则表达式将返回的匹配项,如果这不符合您的用例,则答案为否。

答案 1 :(得分:1)

具有4个正则表达式的解决方案:

我知道你说1个正则表达式,但是发布多个正则表达式的解决方案将有助于理解它 你可以用4次替换来做到这一点。

1.将\b\w+\b替换为'\0'˙演示here
2.将^\/替换为[\n\t[1] => [˙演示here
3.将\/替换为,˙演示here
4.将,('\w+')$替换为],\n\t[2] => \1\n]˙演示here(这是您要查找的最终结果)

具有1个正则表达式的解决方案:

我警告你,这很难看。在对字符串使用它之前,需要在原始字符串的末尾附加以下字符串:
'[\n\t[1] => [],\n\t[2] => ''\n]
所以你需要做一些事情(我不知道你使用的是哪种语言):
final_string = replace(original_string + "'[\n\t[1] => [],\n\t[2] => ''\n]", match_regex, replace_regex)

所以这是正则表达式:

\b(\w+)\b(?=\/[^']*('))|^\/(?=.*(\[\n\t\[1\] => \[))|\/(?=[^\/]*\/.*(,))|\/(\w+)(?=.*(],\n\t\[2\] => ')('\n\]))|'.*$


替换为:\2\1\2\3\4\6\5\7

见演示here

这是一个评论版本:

(?x)    # turn on free spacing mode
    \b(\w+)\b (?=    # match an entire word and capture it (group 1)
        \/    # must be followed by a / (so last word is not matched)
        [^']*    # then by a sequence of any character except '
        (')    # and then a ' which is captured into group 2
    ) |    # OR
    ^\/ (?=    # match a / at the beginning of the string
        .*    # followed by a sequence of any character
        (\[\n\t\[1\] => \[)    # followed by this specific sequence (captured into group 3)
    ) |    # OR
    \/ (?=    # match a / (not at the beginning this time)
        [^\/]*    # followed by any sequence of characters that are not /
        \/    # followed by a / (so last only last / is not matched)
        .*    # then any sequence of characters
        (,)    # then a , (captured into group 4)
    ) |    # OR
    \/(\w+) (?=    # match a word beginning with a / (capture the word into group 5)
                   # note that the only word still not matched there should be the last one
        .*    # followed by any sequence of characters
        (],\n\t\[2\] => ')    # then this specific sequence (captured into group 6)
        ('\n\])    # then this specific sequence (captured into group 7)
    ) |    # OR
    '.*$    # match everything possible (this is the previously appended string, which is erased here
            # because it is matched, and replaced by nothing
(?-x)

它可以被优化,但它应该足以满足您的需求...
记得要激活单行标志以使其工作(我可以在没有单行标志的情况下完成它,但它会更难看)。
如果你只想要一个正则表达式,并且不能在你的字符串中附加任何东西,那么我将无法为你做更多的事情(我不会说这是不可能的,但我认为是)