使用wget或curl来测试网站的.htaccess + robots.txt

时间:2017-08-27 23:55:26

标签: .htaccess curl wget robots.txt

我正在尝试调试我的网站.htaccess + robots.txt,我想使用cURL或wget尝试访问我使用robots.txt阻止的文件或应该重定向到其他位置的网页。 htaccess的

我的robots.txt

中有以下内容
User-agent: *
Disallow: /wp/wp-admin/

然而,我仍然能够抓住它

wget的

$ wget http://xxxx.com/wp/wp-admin/
SYSTEM_WGETRC = c:/progra~1/wget/etc/wgetrc
syswgetrc = C:\Program Files (x86)\GnuWin32/etc/wgetrc
--2017-08-28 07:37:05--  http://xxxx.com/wp/wp-admin/
Resolving xxxx.com... 118.127.47.249
Connecting to xxxx.com|118.127.47.249|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: http://xxxx.com/wp/wp-login.php?redirect_to=http%3A%2F%2Fxxxx.com%2Fwp%2Fwp-
admin%2F&reauth=1 [following]
--2017-08-28 07:37:12--  http://xxxx.com/wp/wp-login.php?redirect_to=http%3A%2F%2Fxxxx.com%2Fwp%2Fwp-admin%2F&reauth=1
Connecting to xxxx.com|118.127.47.249|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 2891 (2.8K) [text/html]
Saving to: `wp-login.php@redirect_to=http%3A%2F%2Fxxxx.com%2Fwp%2Fwp-admin%2F&reauth=1'

100%[==============================================================================>] 2,891       --.-K/s   in 0.1s

2017-08-28 07:37:17 (22.2 KB/s) - `wp-login.php@redirect_to=http%3A%2F%2Fxxxx.com%2Fwp%2Fwp-admin%2F&re
auth=1' saved [2891/2891]

卷曲

$ curl -L xxx.com/wp/wp-admin -o wp-admin.html
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                Dload  Upload   Total   Spent    Left  Speed
100  1147  100  1147    0     0    107      0  0:00:10  0:00:10 --:--:--   280
0     0    0     0    0     0      0      0 --:--:--  0:01:37 --:--:--     0
100  2891  100  2891    0     0     17      0  0:02:50  0:02:42  0:00:08   234

既不是wget也不是curl尊重robots.txt 有没有办法检查我的.htaccess + robots.txt怎么样?谢谢!

1 个答案:

答案 0 :(得分:3)

robots.txt纯粹是针对搜索引擎机器人,大多数用户浏览器 [包括wget和curl] 都会被忽略,如果你想检查你的robots.txt是否可解析你可以使用google的网站管理员控制台中的检查程序,显示robots.txt文件中可能存在的任何错误和问题。

使用.htaccess重定向应适用于任何浏览器,wget应显示这些重定向。