我正在爬几个网站,一切正常,
但....我有一个特定的网站,我正在尝试抓取, 在登陆我想要的网页之前,它正在制作一些“重定向”。
所以它就像......
http://www.example.com/?day=01/01/2016&action=search_prices
这将转到http://www.example.com/search/default.aspx花几秒钟搜索答案页面,然后在那里显示。
有没有办法轻松做到这一点?任何提示,线索等都会很棒
现在的简单代码(几乎所有我抓取的网站都是jsons):
function get_web_page( $url ){
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_HTTPHEADER => array('HeaderName: HeaderValue'),
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_SSL_VERIFYPEER => false // Disabled SSL Cert checks
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}