我正在寻找能够解析字符串(url)的正则表达式:
/page/folder1/folder2/.../folderN/pagefile
到
[
[1] => ['page', 'folder1', 'folder2', ..., 'folderN'],
[2] => 'pagefile'
]
我无法了解子模式是否以及如何工作。 我只能使用正则表达式,没有任何进一步的代码。 有可能吗?
编辑1:我知道如何在没有正则表达式的情况下执行此操作。这不是问题。
编辑2:这个问题的答案应该有助于解决this
答案 0 :(得分:2)
模式:
Match 1
Full match 1-5 `page`
Group 1. 1-5 `page`
Match 2
Full match 6-13 `folder1`
Group 1. 6-13 `folder1`
Match 3
Full match 14-21 `folder2`
Group 1. 14-21 `folder2`
Match 4
Full match 22-25 `...`
Group 1. 22-25 `...`
Match 5
Full match 26-33 `folderN`
Group 1. 26-33 `folderN`
Match 6
Full match 34-42 `pagefile`
Group 2. 34-42 `pagefile`
输出/匹配:
{{1}}
这些是正则表达式将返回的匹配项,如果这不符合您的用例,则答案为否。
答案 1 :(得分:1)
我知道你说1个正则表达式,但是发布多个正则表达式的解决方案将有助于理解它
你可以用4次替换来做到这一点。
1.将\b\w+\b
替换为'\0'
˙演示here
2.将^\/
替换为[\n\t[1] => [
˙演示here
3.将\/
替换为,
˙演示here
4.将,('\w+')$
替换为],\n\t[2] => \1\n]
˙演示here(这是您要查找的最终结果)
我警告你,这很难看。在对字符串使用它之前,需要在原始字符串的末尾附加以下字符串:
'[\n\t[1] => [],\n\t[2] => ''\n]
所以你需要做一些事情(我不知道你使用的是哪种语言):
final_string = replace(original_string + "'[\n\t[1] => [],\n\t[2] => ''\n]", match_regex, replace_regex)
所以这是正则表达式:
\b(\w+)\b(?=\/[^']*('))|^\/(?=.*(\[\n\t\[1\] => \[))|\/(?=[^\/]*\/.*(,))|\/(\w+)(?=.*(],\n\t\[2\] => ')('\n\]))|'.*$
替换为:\2\1\2\3\4\6\5\7
见演示here
这是一个评论版本:
(?x) # turn on free spacing mode
\b(\w+)\b (?= # match an entire word and capture it (group 1)
\/ # must be followed by a / (so last word is not matched)
[^']* # then by a sequence of any character except '
(') # and then a ' which is captured into group 2
) | # OR
^\/ (?= # match a / at the beginning of the string
.* # followed by a sequence of any character
(\[\n\t\[1\] => \[) # followed by this specific sequence (captured into group 3)
) | # OR
\/ (?= # match a / (not at the beginning this time)
[^\/]* # followed by any sequence of characters that are not /
\/ # followed by a / (so last only last / is not matched)
.* # then any sequence of characters
(,) # then a , (captured into group 4)
) | # OR
\/(\w+) (?= # match a word beginning with a / (capture the word into group 5)
# note that the only word still not matched there should be the last one
.* # followed by any sequence of characters
(],\n\t\[2\] => ') # then this specific sequence (captured into group 6)
('\n\]) # then this specific sequence (captured into group 7)
) | # OR
'.*$ # match everything possible (this is the previously appended string, which is erased here
# because it is matched, and replaced by nothing
(?-x)
它可以被优化,但它应该足以满足您的需求...
记得要激活单行标志以使其工作(我可以在没有单行标志的情况下完成它,但它会更难看)。
如果你只想要一个正则表达式,并且不能在你的字符串中附加任何东西,那么我将无法为你做更多的事情(我不会说这是不可能的,但我认为是)