是否可以使用wget
为特定文件类型搜寻主机?我正在从FTP归档一些文档,我需要让它仅下载.txt
个文件即可对整个主机进行爬网。
我尝试过这样:
wget mysite.com/ftplist --config=./.wgetrc
使用以下.wgetrc
:
accept = txt
check_certificate = off
connect_timeout = 3
cookies = off
dns_cache = off
follow_ftp = on
logfile = amz.log
max_redirect = 3
no_clobber = on
recursive = on
save_headers = on
这将呼叫mysite.com/ftplist
。此页面在列表中包含ftp://
URL。 wget
对此页面进行了请求,但不会继续进行,似乎会在该页面上停止。
这里是amz.log
Saving to: ‘mysite.com/ftplinks/index.html.tmp’
0K .......... .......... .......... .......... .......... 656K
50K .......... .......... .......... .......... .......... 741K
100K .......... .......... .......... .......... .......... 1.12M
150K .......... .......... .......... .......... .......... 975K
200K .......... .......... .......... .......... .......... 935K
250K .......... .......... .......... .......... .......... 835K
300K .......... .......... .......... .......... .......... 870K
350K .......... .......... .......... .......... .......... 1.07M
400K .......... .......... .......... ....... 907K=0.5s
2018-12-20 17:55:54 (881 KB/s) - ‘mysite.com/ftplinks/index.html.tmp’ saved [447555]
Removing mysite.com/ftplinks/index.html.tmp since it should be rejected.
我想念什么吗?