I am writing a regex in php to help me get the various parts of a url like so :
$rule = "/^(?P<scheme>(http[s]?|ftp|mailto)):\/\/(?P<auth>([a-zA-Z]+:[a-zA-Z0-9-_]+))?@?(?P<domain>([a-zA-Z0-9-_]+).([a-z.]+)):?(?P<port>([0-9]{2,4}))?\/?(?P<path>[a-z0-9-\/]+(?=\/?))?\?(?P<query>[a-z0-9-_=\\?]+)?\/?(?P<hash>[#a-z0-9-_]+)$/";
$url = "https://user:password@store.example.co.uk:80/search?q=term?lang=en#anchor";
if (preg_match($rule, $url, $matches)) {
foreach ($matches as $key => $match) {
if (is_string($key)) {
$params[$key] = $match;
}
}
print_r($params);
}
The above code gives me:
Array
(
[scheme] => https
[auth] => user:password
[domain] => store.example.co.uk
[port] => 80
[path] => search
[query] => q=term?lang=en
[hash] => #anchor
)
But i want to get something like so :
Array
(
[scheme] => https
[auth] => user:password
[domain] => Array (
[sub-domain] => store,
[domain-name] => example,
[top-level-domain] => co.uk
[port] => 80
[path] => search
[query] => Array (
[q] => term
[lang] => en
[hash] => #anchor
)
Is there a way i can achieve it using only regex or use some other php function & regex to get or separate the various parts again.
nb: the top level domain could be either .co.uk or .com, anything in that format & the domain could be www.example.com or example.com or store.example.com. Some parts of the url are optional & i still want to get each part if the optional ones are not specified.
For example if i skip the sub-domain part, "example" becomes the sub-domain and ".com" becomes the domain, i want to still get the "example" to be domain if there's no sub-domain specified.
Thank you.