使用正则表达式识别URL类型并基于它执行操作

时间:2011-05-26 12:04:55

标签: php regex

我有类似下面的网址需要被PHP代码识别。根据URL的内容,需要显示数据:

www.example.com/music/username/popular
http://www.example.com/music/username/recent/
http://example.com/music/username/favorites/ignore_this /*Ignore everything after favorites*/
http://www.example.com/music/2011/05/02 /*Shows all music uploaded on this date*/
www.example.com/groups
http://www.example.com/groups/jazz
http://example.com/places/japan/?param=ignore_this /*Ignore everything after japan*/
www.example.com/search/rock/

第一个网址应该显示用户的热门音乐。 www.example.com/groups应列出所有公共群组。等等..

  • http://是可选的
  • 最后
  • /是可选的
  • 如果在大写中输入任何内容(例如groups),则应将其转换为小写

使用正则表达式创建Switch案例识别这些网址的最佳方法是什么?示例代码段会很棒。

3 个答案:

答案 0 :(得分:0)

使用通用网址正则表达式查找它们,并使用preg_replace_callback()调用使用parse_url()提取所需部分的回调函数。

不要试图在一个正则表达式中做太多。

答案 1 :(得分:0)

这是我使用的系统(它是OOP,但如果您不喜欢类,可以轻松更改)。

if($this->request->uriMatch('#^/$#'))   //simplest regexp, no substring is matched
    $this->home();  //show the Home page
elseif($this->request->uriMatch('#^/news/(\d+)\.html$#')) //matches a number!
    $this->newsItem($this->request->uri(0),0); // calls newsItem() function and passes the first (0th) matched substring (in our case it's number) to it as an argument
elseif($this->request->uriMatch('#^/news_(\d{4})_(\d{1,2})\.html$#')) //matches 2 numbers
    $this->newsList(0,$this->request->uri(0),$this->request->uri(1)); //passes both numbers to function newsList()
elseif($this->request->uriMatch('#^/products/latest(?:-(\d+))?\.html$#')) //may match one number, or may not match anything
    $this->products('latest',$this->request->uri(0,1)); //if matched, passes the matched number, if not: passes "1" (as default value)
elseif($this->request->uriMatch('#^/products/(\d+)(?:-(\d+))?\.html$#')) //may match 1 or 2 numbers, this is a mix of previous 2 cases :)
    $this->products($this->request->uri(0),$this->request->uri(1,1));
else    //if nothing was matched, then 404!
    $this->response->redirect('/404.html');

请注意,(?: ) regexp是不匹配的子模式,因此不会影响任何内容。

您提供的案例的一个示例:

if($this->request->uriMatch('#^/music/([a-z0-9]+)/favorites/?#i'))

?表示最后/可能不存在。请注意,最后没有$符号,这意味着favorites之后的所有内容都将被忽略。 i修饰符(在#之后)表示文本大小写并不重要。

$this->requestclass Request的一个实例,现在是:

class Request{
    private $uri;   //this holds the URI
    private $uriArray;  //this will hold the matched substrings of the URI according to our REGEXPs
    public function __construct(){
        // initializes URI, it doesn't contain http:// and the domain!
        $this->uri = $_SERVER['REQUEST_URI'];
    }
    public function uriMatch($regex){
        // parses URL according to REGEX
        $b = preg_match($regex, $this->uri, $this->uriArray); // $b is false, if the URL was not matched
        if($b==1)   //if $b is not false, uriArray contains the URL AND the matched substrings (http://am.php.net/manual/en/function.preg-match.php).
            array_shift($this->uriArray); // we are removing the first element (which is the URL), we need only matched substrings

        return $b==1; //returns true if and only if the URL was matched!
    }

    public function uri($n, $default=false){
        //returns n-th matched substring, or $default, if it was not set
        // ... one can add some error handling here
        return isset($this->uriArray[$n]) ? $this->uriArray[$n] : $default;
    }
}

答案 2 :(得分:0)

这是我要使用的正则表达式:

preg_match('%(?:www)?.example.com/(\w+)/?(\w+)?/?(\w+)?/?(\w+)?%i',$matchee)

您可以测试matchee是否进行实际匹配,并构建不同的情况:ignore_this或date parts。 strtolower($matchee[1])将包含域后面的小写第一个元素等等......

Nota:我建议使用RegexBuddy作为调试正则表达式查询的工具。我经常做。