使用捕获组隔离URL路径

时间:2018-05-30 03:13:30

标签: regex

是否可以拥有n个捕获组?

例如,

http://www.example.com/first-path
http://www.example.com/first-path/second-path
http://www.example.com/first-path/second-path/third-path
http://www.example.com/something.html
http://www.example.com/first-path?id=5

我正在尝试将first-path作为第1组捕获,将second-path捕获为第2组,将third-path作为第3组捕获http:\/\/(.*)\/(?!.*\/$)(.*),但不会拆分细分。

没有使用特定的编程语言。

1 个答案:

答案 0 :(得分:1)

如果您使用的是PHP,则可以执行以下操作。第一次拆分会删除开头的http://www.example.com/部分,然后第二次拆分会在/周围拆分这些值:

$urls = array('http://www.example.com/first-path',
'http://www.example.com/first-path/second-path',
'http://www.example.com/first-path/second-path/third-path',
'http://www.example.com/something.html',
'http://www.example.com/first-path?id=5');

foreach ($urls as $url) {
    $tail = preg_split('#https?://[^/]+/#', $url, -1, PREG_SPLIT_NO_EMPTY)[0];
    $paths = preg_split('#/#', $tail);
    print_r($paths);
}

输出:

Array
(
    [0] => first-path
)
Array
(
    [0] => first-path
    [1] => second-path
)
Array
(
    [0] => first-path
    [1] => second-path
    [2] => third-path
)
Array
(
    [0] => something.html
)
Array
(
    [0] => first-path?id=5
)

可以用Javascript完成类似的事情:

let urls = ['http://www.example.com/first-path',
'http://www.example.com/first-path/second-path',
'http://www.example.com/first-path/second-path/third-path',
'http://www.example.com/something.html',
'http://www.example.com/first-path?id=5'];
console.log(urls.map(s => s.split(/https?:\/\/[^\/]+\//)[1].split(/\//)))

输出:

Array(5) […]    ​
  0: Array [ "first-path" ]
  1: Array [ "first-path", "second-path" ]
  2: Array(3) [ "first-path", "second-path", "third-path" ]
  3: Array [ "something.html" ]
  4: Array [ "first-path?id=5" ]