经典的ASP正则表达式从本地域获取所有页面

时间:2012-09-06 11:08:22

标签: regex asp-classic

您好我需要一个正则表达式来获取来自localdomain的所有链接没有外部网站。 直到现在我有这个,但只返回外页

<%function getPage(strURL)
dim strBody, objXML

set objXML = CreateObject("Msxml2.ServerXMLHTTP.6.0")
    objXML.Open "GET", strURL, False
    'objXML.setRequestHeader "User-Agent", "ddd" '===  falsify the agent
    'objXML.setRequestHeader "Content-Type", "text/html; Charset:ISO-8859-1"
    'objXML.setRequestHeader "Content-Type", "text/html; Charset:UTF-8"
    objXML.Send  
    status = objXML.status 
if err.number <> 0 or status <> 200 then 
    if status = 404 then 
        Response.Write "[EFERROR]Page does not exist (404)." 
    elseif status >= 401 and status < 402 then 
        Response.Write "[EFERROR]Access denied (401)." 
    elseif status >= 500 and status <= 600 then 
        Response.Write "[EFERROR]500 Internal Server Error on remote site." 
    else 
        Response.write "[EFERROR]Server is down or does not exist." 
    end if 
      end if
    strBody = objXML.responseText

set objXML = nothing
getPage = strBody
'First, create a reg exp object
Dim objRegExp
Set objRegExp = New RegExp

objRegExp.IgnoreCase = True
objRegExp.Global = True
objRegExp.Pattern = "<a\s+href=""http://(.*?)"">\s*((\n|.)+?)\s*</a>"

'Display all of the matches
Dim objMatch
For Each objMatch in objRegExp.Execute(strBody)
  Response.Write("http://" & objMatch.SubMatches(0) & "<br>")
Next

end function


getPage("http://www.google.com")
%>

谢谢

1 个答案:

答案 0 :(得分:0)

也许说明显了,但如果您在“localdomain.com”中搜索链接不是这样的话

objRegExp.Pattern = "<a\s+href=""http://(.*?)localdomain\.com"">\s*((\n|.)+?)\s*</a>"

修改:     正则表达式模式也许可以这样使用传入的URL:

objRegExp.Pattern = "<a\s+href=""" & strURL & "(.*?)"">\s*((\n|.)+?)\s*</a>"

检索到的匹配也需要附加strURL:

For Each objMatch in objRegExp.Execute(strBody)
  Response.Write("http://" & strURL &  objMatch.SubMatches(0) & "<br>")
Next