Question

我写了一个简单的程序来从网站获取HTML代码。这段代码适用于某些网站，有些则不适用。 e.g。

DOESNT WORK

use strict;
use warnings;
use LWP::Simple;
sub main{
print "downloading...\n";
print get ("http://en.wikipedia.org/wiki/Main_Page");
print "Finished..\n";
}
main();

WORKS

use strict;
use warnings;
use LWP::Simple;
sub main{
print "downloading...\n";
print get ("https://www.google.com/");
print "Finished..\n";
}
main();

对于维基页面，它会出错，对于谷歌页面则不会出错。我在这里缺少什么顺便说一句，我是perl的新手。

感谢

Answer 1

您没有检查错误。由于LWP::Simple建议必须定义get()返回值，

my $content = get("http://en.wikipedia.org/wiki/Main_Page");
die "Couldn't get it!" unless defined $content;

或

my $content = get("http://en.wikipedia.org/wiki/Main_Page") 
  // die "Couldn't get it!";

print $content;

Answer 2

也许您需要多次尝试下载您感兴趣的网页。如果您使用的是Linux，则可以使用system("wget -T 10 -q -N --unlink -O $savepath $url")

使用perl从网站获取html文本时出错“在行中使用未初始化的值”

2 个答案: