如何阻止所有访问控制器的访问者除搜索机器人外？

Answer 1

您可以使用User Agent标头创建拒绝请求的过滤器属性。这个问题的用处是有问题的（并不是一个安全功能），因为标题很容易伪造，但它会阻止人们在股票浏览器中这样做。

This page包含googlebot使用的用户代理字符串列表。

将非googlebots重定向到错误控制器上的404操作的示例代码：

[AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
public class BotRestrictAttribute : ActionFilterAttribute {

    public override void OnActionExecuting(ActionExecutingContext c) {
      if (c.RequestContext.HttpContext.Request.UserAgent != "Googlebot/2.1 (+http://www.googlebot.com/bot.html)") {
        c.Result = RedirectToRouteResult("error", new System.Web.Routing.RouteValueDictionary(new {action = "NotFound", controller = "Error"}));
      }
    }
}

编辑回复评论。如果服务器负载是站点地图的问题，则限制对僵尸程序的访问可能是不够的。 Googlebot本身有能力在服务器决定大幅削减服务器时停止服务器。您也应该缓存响应。您可以使用相同的FilterAttribute和Application.Cache。

这是一个非常粗略的例子，可能需要使用属性HTTP标头进行调整：

[AttributeUsage(AttributeTargets.Method, AllowMultiple = false)]
public class BotRestrictAttribute : ActionFilterAttribute {

    public const string SitemapKey = "sitemap";

    public override void OnActionExecuting(ActionExecutingContext c) {
      if (c.RequestContext.HttpContext.Request.UserAgent != "Googlebot/2.1 (+http://www.googlebot.com/bot.html)") {
        c.Result = RedirectToRouteResult("error", new System.Web.Routing.RouteValueDictionary(new {action = "NotFound", controller = "Error"}));
        return;
      }

      var sitemap = Application.Cache[SitemapKey];
      if (sitemap != null) {
        c.Result = new ContentResult { Content = sitemap};
        c.HttpContext.Response.ContentType = "application/xml";
      }

    }
}

//In the sitemap action method
string sitemapString = GetSitemap();
HttpContext.Current.Cache.Add(
 BotRestrictAttribute.SitemapKey, //cache key
 sitemapString, //data
 null, //No dependencies
 DateTime.Now.AddMinutes(1), 
 Cache.NoSlidingExpiration, 
 CacheItemPriority.Low, 
 null //no callback
);

Answer 2

我正在使用Igor's solution稍微扭曲一下。

首先，我有以下浏览器文件

<强> SearchBot.browser

<browsers>
    <browser id="Slurp" parentID="Mozilla">
        <identification>
            <userAgent match="Slurp" />
        </identification>
        <capabilities>
            <capability name="crawler" value="true" />
        </capabilities>
    </browser>
    <browser id="Yahoo" parentID="Mozilla">
        <identification>
            <userAgent match="http\:\/\/help.yahoo.com\/help\/us\/ysearch\/slurp" />
        </identification>
        <capabilities>
            <capability name="crawler" value="true" />
        </capabilities>
    </browser>
    <browser id="Googlebot" parentID="Mozilla">
        <identification>
            <userAgent match="Googlebot" />
        </identification>
        <capabilities>
            <capability name="crawler" value="true" />
        </capabilities>
    </browser>
    <browser id="msnbot" parentID="Mozilla">
        <identification>
            <userAgent match="msnbot" />
        </identification>
        <capabilities>
            <capability name="crawler" value="true" />
        </capabilities>
    </browser>
</browsers>

然后是ActionFilterAttribute

Imports System.Web.Mvc
Imports System.Net
Imports System.Web

Namespace Filters
    <AttributeUsage(AttributeTargets.Method, AllowMultiple:=False)> _
    Public Class SearchBotFilter : Inherits ActionFilterAttribute

        Public Overrides Sub OnActionExecuting(ByVal c As ActionExecutingContext)
            If Not HttpContext.Current.Request.Browser.Crawler Then
                HttpContext.Current.Response.StatusCode = CInt(HttpStatusCode.NotFound)
                c.Result = New ViewResult() With {.ViewName = "NotFound"}
            End If
        End Sub
    End Class
End Namespace

最后我的控制器

    <SearchBotFilter()> _
    Function Index() As ActionResult
        Return View()
    End Function

谢谢伊戈尔，这是一个很好的解决方案。

Answer 3

您可以使用的另一件事是DNS查询，这里可以解释Verifying Googlebot

您可以在ViewEngine中添加反向DNA查找。

ASP.NET MVC - 阻止除搜索机器人（Googlebot，Yahoo Slurp等）之外的特定控制器的所有访问者

如何阻止所有访问控制器的访问者除搜索机器人外？

3 个答案: