与主页/索引页匹配的URL的正则表达式

时间:2012-12-18 19:47:47

标签: regex url

我正在尝试使用正则表达式来指示提供的URL是否是站点的索引页面。这意味着它必须匹配domain.com,domain.com和domain.com/index.php但不匹配domain.com/page.php

这是我提出的用于测试的列表。由于www / nonwww,http / https,尾随斜杠等原因导致的许多排列

它应符合这些:

它不应该匹配这些

(我还有其他任何组合吗?)

到目前为止,我想出的只是:

  

site.com(/ |的index.php |)

这显然不正确,因为它也匹配/页面值。

2 个答案:

答案 0 :(得分:7)

有效

^https?://[^/]+(/(\?.*|index\.php(\?.*)?)?)?$

注意这是一个通用的正则表达式。为了配合你的口味,你可能需要逃避。

使用egrep运行简单测试后,结果为

$ while read x 
>       do 
>           if  echo $x | egrep '^https?://[^/]+(/(\?.*|index\.php(\?.*)?)?)?$' > /dev/null
>           then  
>               echo MATCH $x
>           else 
>               echo NOT MATCH $x 
>           fi
>       done < data
MATCH http://site.com/index.php
MATCH http://site.com/
MATCH http://site.com
MATCH http://site.com/index.php?var=X
MATCH http://site.com/?var=X
MATCH http://site.com?var=X
MATCH https://site.com/index.php
MATCH https://site.com/
MATCH https://site.com
MATCH https://site.com/index.php?var=X
MATCH https://site.com/?var=X
MATCH https://site.com?var=X
MATCH http://www.site.com/index.php
MATCH http://www.site.com/
MATCH http://www.site.com
MATCH http://www.site.com/index.php?var=X
MATCH http://www.site.com/?var=X
MATCH http://www.site.com?var=X
MATCH https://www.site.com/index.php
MATCH https://www.site.com/
MATCH https://www.site.com
MATCH https://www.site.com/index.php?var=X
MATCH https://www.site.com/?var=X
MATCH https://www.site.com?var=X
NOT MATCH http://site.com/page.php
NOT MATCH http://site.com/page.php?var=X
NOT MATCH http://site.com/page
NOT MATCH http://site.com/page/
NOT MATCH http://site.com/page/index.php
NOT MATCH http://site.com/page?var=X
NOT MATCH http://site.com/page/?var=X
NOT MATCH https://site.com/page.php
NOT MATCH https://site.com/page.php?var=X
NOT MATCH https://site.com/page
NOT MATCH https://site.com/page/
NOT MATCH https://site.com/page/index.php
NOT MATCH https://site.com/page?var=X
NOT MATCH https://site.com/page/?var=X
NOT MATCH http://www.site.com/page.php
NOT MATCH http://www.site.com/page.php?var=X
NOT MATCH http://www.site.com/page
NOT MATCH http://www.site.com/page/
NOT MATCH http://www.site.com/page/index.php
NOT MATCH http://www.site.com/page?var=X
NOT MATCH http://www.site.com/page/?var=X
NOT MATCH https://www.site.com/page.php
NOT MATCH https://www.site.com/page.php?var=X
NOT MATCH https://www.site.com/page
NOT MATCH https://www.site.com/page/
NOT MATCH https://www.site.com/page/index.php
NOT MATCH https://www.site.com/page?var=X
NOT MATCH https://www.site.com/page/?var=X

答案 1 :(得分:0)

假设你在PHP中这样做。您应该使用parse_url()(http://php.net/manual/en/function.parse-url.php),然后查看路径元素。

<?php
$url = "http://example.com/index.php?page=1";
$path = parse_url($url, PHP_URL_PATH);
print "path=$path\n";
?>

运行它,你得到

path=/index.php

只有$path中的路径,只需匹配//index.php或其他任何内容。没有必要的正则表达式。