LWP和LWP的许多问题HTML解析器::

时间:2015-01-04 16:49:57

标签: perl post get html-parsing lwp

大家好我编写了一个脚本来搜索网页上的字符串,但请求不起作用我不知道为什么......

网站:http://www.matrixx.com/ 要搜索的字符串:解决方案

代码:

#!/usr/bin/perl
use strict;
use IO::Socket;
use Term::ANSIColor;
use HTML::Parser;
use LWP::UserAgent;
use LWP::Simple;
use vars qw( $PROG );

$SIG{'INT'} = sub {exit;};

my $stringsearch = "solutions";

my $url = "http://www.matrixx.com/";
my $ua = LWP::UserAgent->new;
print "\e[96m[!]Searching \e[31m$url\n\e[0m";    
my $response = $ua->post($url);
if ( !$response->is_success ) 
{
 print "error\n";
}


my $parser = HTML::Parser->new( 'text_h' => [ \&text_handler, 'dtext' ] );
$parser->parse( $response->decoded_content );

sub text_handler 
{
    chomp( my $text = shift );
    if ( $text =~ /$stringsearch/i )
    {

        print "\e[96m[+]Found: \e[32m$url\e[0m\n";

    }

    else
    {
        print "Not Found \n";
    }
}

2 个答案:

答案 0 :(得分:3)

在不少行中&使用

#!/usr/bin/perl
use strict;
use LWP::UserAgent;
use HTML::TreeBuilder::XPath;

my $stringsearch = "solutions";

my $url = "http://www.matrixx.com/";
my $ua = LWP::UserAgent->new;
my $response = $ua->get($url);
die "Http error\n" unless $response->is_success;

my $tree = HTML::TreeBuilder::XPath->new_from_content(
    $response->decoded_content
);

print "searched string found\n" if $tree->exists(
  "//*[contains(name(), '$stringsearch')] | //@*[contains(., '$stringsearch')]"
);

答案 1 :(得分:1)

为页面上的每个文本片段调用text_handler。它确实找到了你的搜索字符串,但只在其中一个中找到;你正在为所有其他人打印Not Found。

如果您只想为每个网址打印一次Found或Not Found,请执行以下操作:

my $found;
my $text_handler = sub {
    chomp( my $text = shift );
    if ( $text =~ /$stringsearch/i ) {
        $found = 1;
    }
};
my $parser = HTML::Parser->new( 'text_h' => [ $text_handler, 'dtext' ] );
$parser->parse( $response->decoded_content );
if ($found) {
    print "\e[96m[+]Found: \e[32m$url\e[0m\n";
}
else {
    print "Not Found\n";
}

(如果这不能解答您的问题,请更明确地了解您所看到的内容以及与您期望看到的内容有何不同。)