Question

如何制作将复制所有链接的Bash脚本（非下载网站）。该函数仅用于获取所有链接，然后将其保存在txt文件中。

我试过这段代码：

wget --spider --force-html -r -l1 http://somesite.com | grep 'Saving to:'

示例：网站中有下载链接（例如，dlink.com），所以我只想复制包含dlink.com的所有单词并将其保存到txt文件中。

我用Google搜索过，我发现它没有用。

Answer 1

在Perl中使用正确的解析器：

#!/usr/bin/env perl -w

use strict;
use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;

my $ua = LWP::UserAgent->new;
my ($url, $f, $p, $res);

if(@ARGV) { 
    $url = $ARGV[0]; }
else {
    print "Enter an URL : ";
    $url = <>;
    chomp($url);
}

my @array = ();
sub callback {
   my($tag, %attr) = @_;
   return if $tag ne 'a';  # we only look closer at <a href ...>
   push(@array, values %attr) if $attr{href} =~ /dlink\.com/i;
}

# Make the parser.  Unfortunately, we don’t know the base yet
# (it might be diffent from $url)
$p = HTML::LinkExtor->new(\&callback);

# Request document and parse it as it arrives
$res = $ua->request(HTTP::Request->new(GET => $url),
                    sub {$p->parse($_[0])});

# Expand all URLs to absolute ones
my $base = $res->base;
@array = map { $_ = url($_, $base)->abs; } @array;

# Print them out
print join("\n", @array), "\n";

蜘蛛网站并检索包含关键字的所有链接

1 个答案: