我想写一个程序:
如何连接网页并从中读取数据?并保存该数据?
答案 0 :(得分:4)
Perl拥有各种Web套件,可以完成不同的任务。您可以考虑使用LWP::UserAgent
+ HTML::Tree
,Web::Query
和Mojo
。我更希望Mojo。
一旦我们有了页面,我们就可以使用CSS选择器来提取我们感兴趣的数据。在这里,我看一下新的perl个问题:
use strict; # safety net
use warnings; # safety net
use feature 'say'; # a better "print"
use Mojo;
# fetch the stackoverflow perl page
my $ua = Mojo::UserAgent->new;
my $perl_page = $ua->get('http://stackoverflow.com/questions/tagged/perl')->res->dom;
# extract all questions:
my $questions = $perl_page->at('#questions');
for my $question ($questions->find('h3 > a')->each) {
say $question->all_text;
say " <", $question->attr('href'), ">";
}
输出:
Perl script, parse text file between words
</questions/20432447/perl-script-parse-text-file-between-words>
Having issues with Spreadsheet::WriteExcel that makes me run the script twice to get desired file
</questions/20432157/having-issues-with-spreadsheetwriteexcel-that-makes-me-run-the-script-twice-to>
Calculate distance between a single atom and other atoms in a pdb file; print issue
</questions/20431884/calculate-distance-between-a-single-atom-and-other-atoms-in-a-pdb-file-print-is>
Exit status of child spawned in a pipe
</questions/20431810/exit-status-of-child-spawned-in-a-pipe>
How get data from a web page and save it with perl?
</questions/20431443/how-get-data-from-a-web-page-and-save-it-with-perl>
GatoIcon.py automatically generated <?> from images via perl?
</questions/20431389/gatoicon-py-automatically-generated-from-images-via-perl>
How and when can I use PPMs that weren't built in in ActivePerl 5.18?
</questions/20430599/how-and-when-can-i-use-ppms-that-werent-built-in-in-activeperl-5-18>
Translating perl to python - What does this line do (class variable confusion)
</questions/20429516/translating-perl-to-python-what-does-this-line-do-class-variable-confusion>
Fix files “corrupted” by Perl
</questions/20427916/fix-files-corrupted-by-perl>
how to add slash separator in perl
</questions/20427499/how-to-add-slash-separator-in-perl>
negative look ahead on whole number but preceded by a character(perl)
</questions/20426507/negative-look-ahead-on-whole-number-but-preceded-by-a-characterperl>
Use variable expansion in heredoc while piping data to gnuplot
</questions/20426379/use-variable-expansion-in-heredoc-while-piping-data-to-gnuplot>
How do I create multiple database connections in Catalyst with DBIC
</questions/20425107/how-do-i-create-multiple-database-connections-in-catalyst-with-dbic>
Moose's attribute vs simple sub?
</questions/20424929/mooses-attribute-vs-simple-sub>
How to use unicode in perl CGI param
</questions/20424488/how-to-use-unicode-in-perl-cgi-param>
答案 1 :(得分:2)
您可以使用WWW::Mechanize访问网页内容,甚至可以登录并浏览多个网页:
use WWW::Mechanize;
my $mech = WWW::Mechanize->new();
$mech->get( $url );
$mech->follow_link( n => 3 );
$mech->follow_link( text_regex => qr/download this/i );
$mech->follow_link( url => 'http://host.com/index.html' );
$mech->submit_form(
form_number => 3,
fields => {
username => 'mungo',
password => 'lost-and-alone',
}
);
$mech->submit_form(
form_name => 'search',
fields => { query => 'pot of gold', },
button => 'Search Now'
);
# get all textarea controls whose names begin with "customer"
my @customer_text_inputs = $mech->find_all_inputs(
type => 'textarea',
name_regex => qr/^customer/,
);
# get all text or textarea controls called "customer"
my @customer_text_inputs = $mech->find_all_inputs(
type_regex => qr/^(text|textarea)$/,
name => 'customer',
);
答案 2 :(得分:1)
您需要加载库以连接到另一台服务器并打开一个文件来写入/打印到它:
use LWP::Simple;
my $content = get $url;
open (MYFILE, '>>data.txt');
print MYFILE $content;
close (MYFILE);
Perl手册的Windows help file格式电子书位于https://code.google.com/p/htmlhelp/downloads/detail?name=perl-5.10.0.chm。