如何使用User-Agent for Ipad刮取网站?

时间:2011-06-22 11:56:07

标签: php ipad curl user-agent scrape

如何使用User-Agent for Ipad刮取网站?

我在PHP下面使用curl输出源代码,但仍无法找到代码。在使用Ipad User-Agent的Ipad或Safari浏览器上,标记会在加载网站时显示。

谢谢!

<?php
    $useragent= "Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10')";

    $ch = curl_init ("http://www.cbsnews.com/video/watch/?id=7370279n&tag=mg;mostpopvideo");

    curl_setopt ($ch, CURLOPT_USERAGENT, $useragent); // set user agent
    curl_setopt ($ch, CURLOPT_RETURNTRANSFER, true);
    // curl_setopt($ch, CURLOPT_FOLLOWLOCATION, TRUE);
    echo $output = curl_exec ($ch);

    curl_close($ch);
?>

1 个答案:

答案 0 :(得分:4)

尝试使用命令行中的curl,使用perl脚本,例如:

my $ua = "Mozilla/5.0 (iPad; U; CPU OS 3_2 like Mac OS X; en-us) AppleWebKit/531.21.10 (KHTML, like Gecko) Version/4.0.4 Mobile/7B334b Safari/531.21.10";
my $curl = "curl -A '$ua'";
my $server = "http://www.cbsnews.com";
my $startpage = "$server/video/watch/?id=7370279n&tag=mg;mostpopvideo";
my $path = "/path/to/download/to";
open(f, "$curl -L $startpage |") or die "Cannot open website: $!";
while (<f>)
{
    if (/<a\s+[^>]*href=\"$server\/([^\"\/])*\"/)
    {
        my $file = $2;
        system("$curl -e $startpage $server/$file > $path/$file");
        next;
    }

    if (/<a\s+[^>]*href=\"$server\/([^\"]+)\/([^\"\/])*\"/)
    {
        my $folder = $1;
        my $file = "$folder/$2";
        system("mkdir -p $path/$folder");
        system("$curl -e $startpage $server/$file > $path/$file");
        next;
    }
}
close(f);