Googlebot无法识别动态robots.txt

时间:2016-03-23 11:53:46

标签: php laravel nginx robots.txt googlebot

我创建了一个带有laravel的动态路由,用于提供txt响应。

它适用于浏览器,但googlebot表示没有robots.txt文件。

这是我得到的标题:

Cache-Control →no-cache Connection →keep-alive Content-Disposition →inline; filename="robots.txt" Content-Encoding →gzip Content-Type →text/plain; charset=UTF-8 Date →Wed, 23 Mar 2016 11:36:44 GMT Server →nginx/1.9.12 Transfer-Encoding →chunked Vary →Accept-Encoding

这是我的laravel路线:

Route::get('robots.txt', 'TxtController@robots');

这是方法:

public function robots(){ return response()->view('txt.robots')->header('Content-Type', 'text/plain')->header('Content-Disposition', 'inline; filename="robots.txt"'); }

我尝试使用Content-Disposition →attachment; filename="robots.txt",但谷歌一直说没有robots.txt文件。

我尝试删除Content-Disposition但仍然无法使用Google Web Master Tools(它适用于浏览器)

这是我的 nginx配置,也许这里有问题:

```

server {
listen 80 default_server;
listen [::]:80 default_server;
server_name mydomain.com;
root /home/forge/mydomain.com/public;

# FORGE SSL (DO NOT REMOVE!)
# ssl_certificate;
# ssl_certificate_key;

ssl_protocols TLSv1 TLSv1.1 TLSv1.2;

index index.html index.htm index.php;

charset utf-8;



location / {
    try_files $uri $uri/ /index.php?$query_string;
}

location = /favicon.ico { access_log off; log_not_found off; }
#location = /robots.txt  { access_log off; log_not_found off; }

#location = /robots.txt {
#    try_files $uri $uri/ /index.php?$args;
#    access_log off;
#    log_not_found off;
#}

access_log off;
error_log  /var/log/nginx/mydomain.com-error.log error;

error_page 404 /index.php;

location ~ \.php$ {
    fastcgi_split_path_info ^(.+\.php)(/.+)$;
    fastcgi_pass unix:/var/run/php5-fpm.sock;
    fastcgi_index index.php;
    include fastcgi_params;
}

location ~ /\.ht {
    deny all;
}


# Expire rules for static content

# cache.appcache, your document html and data
location ~* \.(?:manifest|appcache|html?|xml|json)$ {
    expires -1;
    # access_log logs/static.log; # I don't usually include a static log
}

# Feed
location ~* \.(?:rss|atom)$ {
    expires 1h;
    add_header Cache-Control "public";
}

# Media: images, icons, video, audio, HTC
location ~* \.(?:jpg|jpeg|gif|png|ico|cur|gz|svg|svgz|mp4|ogg|ogv|webm|htc)$ {
    expires 1M;
    access_log off;
    add_header Cache-Control "public";
}

# CSS, Javascript and Fonts
location ~* \.(?:css|js|woff|ttf|eot)$ {
    expires 1y;
    access_log off;
    add_header Cache-Control "public";
}
}
```

谢谢。

3 个答案:

答案 0 :(得分:1)

当我检查http://www.google.com/robots.txt时,HTTP响应头是:

Cache-Control:private, max-age=0
Content-Encoding:gzip
Content-Length:1574
Content-Type:text/plain
Date:Wed, 23 Mar 2016 12:07:44 GMT
Expires:Wed, 23 Mar 2016 12:07:44 GMT
Last-Modified:Fri, 04 Mar 2016 19:02:51 GMT
Server:sffe
Vary:Accept-Encoding
X-Content-Type-Options:nosniff
X-XSS-Protection:1; mode=block

为什么不跳过Content-Disposition标题,只输出带有Content-Type:text/plain标题的文字?

也...

  • 您确定您的robots.txt网址可以从外面获取吗?也许使用代理来仔细检查。
  • 您的输出是UTF-8编码的吗?

有关详细信息,请参阅https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt

答案 1 :(得分:1)

我通过添加Content-length标头解决了这个问题。代码结果就是这个:

    $response = response()->view('txt.robots')->header('Content-Type', 'text/plain');
    $response->header('Content-Length',strlen($response->getOriginalContent()));

    return $response;

我希望这会有所帮助。谢谢你的回复。

答案 2 :(得分:0)

Content-Disposition标头用于强制在浏览器中下载文件。它可能会混淆谷歌机器人 - 尝试在没有它的情况下提供文件:

public function robots(){
    return response()->view('txt.robots')->header('Content-Type', 'text/plain');
}