如何制作将复制所有链接的Bash脚本(非下载网站)。该函数仅用于获取所有链接,然后将其保存在txt文件中。
我试过这段代码:
wget --spider --force-html -r -l1 http://somesite.com | grep 'Saving to:'
示例:网站中有下载链接(例如,dlink.com),所以我只想复制包含dlink.com的所有单词并将其保存到txt文件中。
我用Google搜索过,我发现它没有用。
答案 0 :(得分:2)
在Perl中使用正确的解析器:
#!/usr/bin/env perl -w
use strict;
use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;
my $ua = LWP::UserAgent->new;
my ($url, $f, $p, $res);
if(@ARGV) {
$url = $ARGV[0]; }
else {
print "Enter an URL : ";
$url = <>;
chomp($url);
}
my @array = ();
sub callback {
my($tag, %attr) = @_;
return if $tag ne 'a'; # we only look closer at <a href ...>
push(@array, values %attr) if $attr{href} =~ /dlink\.com/i;
}
# Make the parser. Unfortunately, we don’t know the base yet
# (it might be diffent from $url)
$p = HTML::LinkExtor->new(\&callback);
# Request document and parse it as it arrives
$res = $ua->request(HTTP::Request->new(GET => $url),
sub {$p->parse($_[0])});
# Expand all URLs to absolute ones
my $base = $res->base;
@array = map { $_ = url($_, $base)->abs; } @array;
# Print them out
print join("\n", @array), "\n";