我正在尝试下载整个域的HTML,但它也会跟随并下载下面的链接,即使它声明了rel =" nofollow"。
<a href="?s=" rel="nofollow" data-avia-search-tooltip="..." aria-hidden="true" data-av_icon="" data-av_iconfont="entypo-fontello" style="height: 88px; line-height: 88px;"><span class="avia_hidden_link_text">Search</span></a>
我的wget如下:
wget --no-cookies --ignore-tags=link -e robots=on --span-hosts --output- file=/home/markus/python/test/log.txt http://www.kilnbridge.com --domains kilnbridge.com -x -P /home/markus/python/test -r -E --html-extension -R gif,jpg,pdf,png,rss,php,zip,rar,z7,css,js,eot,svg,ttf,woff,exe --ignore-length --max-redirect=100 --quota=10000k --wait=0.1 --no-check-certificate --remote-encoding=encoding
我已经尝试了与wget 1.15和1.18的各种组合但没有成功。
输出wget -version:
GNU Wget 1.18 built on linux-gnu.
-cares +digest -gpgme +https +ipv6 +iri +large-file -metalink +nls+ntlm +opie -psl +ssl/openssl
Wgetrc:
/opt/wget/etc/wgetrc (system)
Locale:
/opt/wget/share/locale
Compile:
gcc -DHAVE_CONFIG_H -DSYSTEM_WGETRC="/opt/wget/etc/wgetrc"
-DLOCALEDIR="/opt/wget/share/locale" -I. -I../lib -I../lib
-DHAVE_LIBSSL -DNDEBUG
Link:
gcc -DHAVE_LIBSSL -DNDEBUG -luuid -lssl -lcrypto -lz -lidn
ftp-opie.o openssl.o http-ntlm.o ../lib/libgnu.a
Copyright (C) 2015 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later
<http://www.gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Originally written by Hrvoje Niksic <hniksic@xemacs.org>.
Please send bug reports and questions to <bug-wget@gnu.org>.