我正在尝试调试我的网站.htaccess + robots.txt,我想使用cURL或wget尝试访问我使用robots.txt阻止的文件或应该重定向到其他位置的网页。 htaccess的
我的robots.txt
中有以下内容User-agent: *
Disallow: /wp/wp-admin/
然而,我仍然能够抓住它
wget的
$ wget http://xxxx.com/wp/wp-admin/
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
--2017-08-28 07:37:05-- http://xxxx.com/wp/wp-admin/
Resolving xxxx.com... 118.127.47.249
Connecting to xxxx.com|118.127.47.249|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://xxxx.com/wp/wp-login.php?redirect_to=http%3A%2F%2Fxxxx.com%2Fwp%2Fwp-
admin%2F&reauth=1 [following]
--2017-08-28 07:37:12-- http://xxxx.com/wp/wp-login.php?redirect_to=http%3A%2F%2Fxxxx.com%2Fwp%2Fwp-admin%2F&reauth=1
Connecting to xxxx.com|118.127.47.249|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2891 (2.8K) [text/html]
Saving to: `wp-login.php@redirect_to=http%3A%2F%2Fxxxx.com%2Fwp%2Fwp-admin%2F&reauth=1'
100%[==============================================================================>] 2,891 --.-K/s in 0.1s
2017-08-28 07:37:17 (22.2 KB/s) - `wp-login.php@redirect_to=http%3A%2F%2Fxxxx.com%2Fwp%2Fwp-admin%2F&re
auth=1' saved [2891/2891]
卷曲
$ curl -L xxx.com/wp/wp-admin -o wp-admin.html
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 1147 100 1147 0 0 107 0 0:00:10 0:00:10 --:--:-- 280
0 0 0 0 0 0 0 0 --:--:-- 0:01:37 --:--:-- 0
100 2891 100 2891 0 0 17 0 0:02:50 0:02:42 0:00:08 234
既不是wget也不是curl尊重robots.txt 有没有办法检查我的.htaccess + robots.txt怎么样?谢谢!
答案 0 :(得分:3)
robots.txt纯粹是针对搜索引擎机器人,大多数用户浏览器 [包括wget和curl] 都会被忽略,如果你想检查你的robots.txt是否可解析你可以使用google的网站管理员控制台中的检查程序,显示robots.txt文件中可能存在的任何错误和问题。
使用.htaccess重定向应适用于任何浏览器,wget应显示这些重定向。