具有4个正则表达式的解决方案：

Question

我正在寻找能够解析字符串（url）的正则表达式：

/page/folder1/folder2/.../folderN/pagefile

到

[
    [1] => ['page', 'folder1', 'folder2', ..., 'folderN'],
    [2] => 'pagefile'
]

我无法了解子模式是否以及如何工作。我只能使用正则表达式，没有任何进一步的代码。有可能吗？

编辑1：我知道如何在没有正则表达式的情况下执行此操作。这不是问题。

编辑2：这个问题的答案应该有助于解决this

Answer 1

模式：

Match 1
    Full match  1-5 `page`
    Group 1.    1-5 `page`
Match 2
    Full match  6-13    `folder1`
    Group 1.    6-13    `folder1`
Match 3
    Full match  14-21   `folder2`
    Group 1.    14-21   `folder2`
Match 4
    Full match  22-25   `...`
    Group 1.    22-25   `...`
Match 5
    Full match  26-33   `folderN`
    Group 1.    26-33   `folderN`
Match 6
    Full match  34-42   `pagefile`
    Group 2.    34-42   `pagefile`

Demo/Explanation Link

输出/匹配：

{{1}}

这些是正则表达式将返回的匹配项，如果这不符合您的用例，则答案为否。

Answer 2

具有4个正则表达式的解决方案：

我知道你说1个正则表达式，但是发布多个正则表达式的解决方案将有助于理解它你可以用4次替换来做到这一点。

1.将\b\w+\b替换为'\0'˙演示here
2.将^\/替换为[\n\t[1] => [˙演示here
3.将\/替换为,˙演示here
4.将,('\w+')$替换为],\n\t[2] => \1\n]˙演示here（这是您要查找的最终结果）

具有1个正则表达式的解决方案：

我警告你，这很难看。在对字符串使用它之前，需要在原始字符串的末尾附加以下字符串：
'[\n\t[1] => [],\n\t[2] => ''\n]
所以你需要做一些事情（我不知道你使用的是哪种语言）：
final_string = replace(original_string + "'[\n\t[1] => [],\n\t[2] => ''\n]", match_regex, replace_regex)

所以这是正则表达式：

\b(\w+)\b(?=\/[^']*('))|^\/(?=.*(\[\n\t\[1\] => \[))|\/(?=[^\/]*\/.*(,))|\/(\w+)(?=.*(],\n\t\[2\] => ')('\n\]))|'.*$

替换为：\2\1\2\3\4\6\5\7

见演示here

这是一个评论版本：

(?x)    # turn on free spacing mode
    \b(\w+)\b (?=    # match an entire word and capture it (group 1)
        \/    # must be followed by a / (so last word is not matched)
        [^']*    # then by a sequence of any character except '
        (')    # and then a ' which is captured into group 2
    ) |    # OR
    ^\/ (?=    # match a / at the beginning of the string
        .*    # followed by a sequence of any character
        (\[\n\t\[1\] => \[)    # followed by this specific sequence (captured into group 3)
    ) |    # OR
    \/ (?=    # match a / (not at the beginning this time)
        [^\/]*    # followed by any sequence of characters that are not /
        \/    # followed by a / (so last only last / is not matched)
        .*    # then any sequence of characters
        (,)    # then a , (captured into group 4)
    ) |    # OR
    \/(\w+) (?=    # match a word beginning with a / (capture the word into group 5)
                   # note that the only word still not matched there should be the last one
        .*    # followed by any sequence of characters
        (],\n\t\[2\] => ')    # then this specific sequence (captured into group 6)
        ('\n\])    # then this specific sequence (captured into group 7)
    ) |    # OR
    '.*$    # match everything possible (this is the previously appended string, which is erased here
            # because it is matched, and replaced by nothing
(?-x)

它可以被优化，但它应该足以满足您的需求...
记得要激活单行标志以使其工作（我可以在没有单行标志的情况下完成它，但它会更难看）。
如果你只想要一个正则表达式，并且不能在你的字符串中附加任何东西，那么我将无法为你做更多的事情（我不会说这是不可能的，但我认为是）

从字符串获取数组的正则表达式模式

2 个答案:

具有4个正则表达式的解决方案：

具有1个正则表达式的解决方案：