正则表达式经典ASP

时间:2013-11-29 12:44:22

标签: regex asp-classic

我目前有一个包含URL的字符串,我需要获取基本URL。

我的字符串是http://www.test.com/test-page/category.html

我正在寻找能够有效删除任何页面/文件夹名称的RegEx。问题是有些人可能会以下列格式进入域名:

http://www.test.com
www.test.co.uk/
www.test.info/test-page.html
www.test.gov/test-folder/test-page.html

每次都必须返回http://www.websitename.ext/,即域名和扩展名(例如.info .com .co.uk等),最后加上正斜杠。

实际上它需要返回基本URL,而不需要任何页面/文件夹名称。使用正则表达式有什么简单的方法吗?

感谢。

2 个答案:

答案 0 :(得分:1)

我的方法:使用 RegEx 提取域名。然后将http:添加到前面,将/添加到结尾。这是RegEx:

^(?:http:\/\/)?([\w_]+(?:\.[\w_]+)+)(?=(?:\/|$))

另请参阅问题this answerExtract root domain name from string(这让我有点不满意,虽然指出需要考虑 https 端口号用户身份验证信息< / strong>我的RegEx 执行。

以下是 VBScript 中的实现。我将RegEx放在一个常量中并定义了一个名为GetDomainName()的函数。您应该能够在ASP页面中包含该功能,如下所示:

normalizedUrl = "http://" & GetDomainName(url) & "/"

您还可以通过将代码保存到名为test.vbs的文件然后将其传递给cscript来从命令提示符测试我的脚本:

cscript test.vbs

测试计划

Option Explicit

Const REGEXPR = "^(?:http:\/\/)?([\w_]+(?:\.[\w_]+)+)(?=(?:\/|$))"
'                    ^^^^^^^^^   ^^^^^^   ^^^^^^^^^^       ^^^^
'                        A         B1         B2            C
'
' A  - An optional 'http://' scheme
' B1 - Followed by one or more alpha-numeric characters
' B2 - Followed optionally by one or more occurences of a string
'      that begins with a period that is followed by
'      one or more alphanumeric characters, and
' C  - Terminated by a slash or nothing.

Function GetDomainName(sUrl)
   Dim oRegex, oMatch, oMatches, oSubMatch

   Set oRegex = New RegExp
   oRegex.Pattern = REGEXPR
   oRegex.IgnoreCase = True
   oRegex.Global = False
   Set oMatches = oRegex.Execute(sUrl)

   If oMatches.Count > 0 Then
       GetDomainName = oMatches(0).SubMatches(0)
   Else
       GetDomainName = ""
   End If
End Function

Dim Data : Data = _
    Array( _
            "xhttp://www.test.com" _
          , "http://www..test.com" _
          , "http://www.test.com." _
          , "http://www.test.com" _
          , "www.test.co.uk/" _
          , "www.test.co.uk/?q=42" _
          , "www.test.info/test-page.html" _
          , "www.test.gov/test-folder/test-page.html" _
          , ".www.test.co.uk/" _
          )

Dim sUrl, sDomainName
For Each sUrl In Data
    sDomainName = GetDomainName(sUrl)

    If sDomainName = "" Then
        WScript.Echo "[ ] [" & sUrl & "]"
    Else
        WScript.Echo "[*] [" & sUrl & "] => [" & sDomainName & "]"
    End If
Next

预期输出:

[ ] [xhttp://www.test.com]
[ ] [http://www..test.com]
[ ] [http://www.test.com.]
[*] [http://www.test.com] => [www.test.com]
[*] [www.test.co.uk/] => [www.test.co.uk]
[*] [www.test.co.uk/?q=42] => [www.test.co.uk]
[*] [www.test.info/test-page.html] => [www.test.info]
[*] [www.test.gov/test-folder/test-page.html] => [www.test.gov]
[ ] [.www.test.co.uk/]

答案 1 :(得分:0)

我在12年内没有使用经典ASP进行编码,这是完全未经测试的。

result = "http://" & Split(Replace(url, "http://",""),"/")(0) & "/"