使用PHP简单的HTML DOM解析器代理

时间:2014-07-21 22:14:44

标签: php html web-scraping simple-html-dom

我遇到了PHP Simple HTML DOM Parser使用代理的问题。我阅读了手册中有关程序的信息,但仍然没有合作。

require_once('simple_html_dom.php');

$url = 'http://www.whatsmyip.org/';
$proxy = '00.000.000.80:80';

$context = array( 
   'http' => array( 
      'proxy' => $proxy,
      'request_fulluri' => true, 
    ), 
);
$context = stream_context_create($context); 

$dom = new simple_html_dom();
$dom = file_get_html($url, false, $context);

echo '<pre>';
print_r($dom);
echo '</pre>';

2 个答案:

答案 0 :(得分:2)

我只更改了一些部分,但显然,您提供的代理示例无效。试试这个:

$context = array('http' => array('proxy' => 'tcp://221.176.14.72:80','request_fulluri' => true,),);
$stream = stream_context_create($context);
$dom = file_get_html('http://www.whatsmyip.org/', false, $stream);
$ip = $dom->find('span#ip', 0)->innertext;
echo $ip;

答案 1 :(得分:2)

我设法使用cURL将页面提供给PHP Simple HTML dom解析器。

require_once('simple_html_dom.php');

$url = 'http://www.whatsmyip.org/';
$proxy = '00.000.000.80:80';

$options = array( 
    CURLOPT_PROXY          => $proxy,
    CURLOPT_HTTPPROXYTUNNEL => 0,
    CURLOPT_REFERER        => "http://www.google.com",
    CURLOPT_FOLLOWLOCATION => true,
    CURLOPT_RETURNTRANSFER => true,
    CURLOPT_USERAGENT      => "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1", 
    CURLOPT_CONNECTTIMEOUT => 20,
    CURLOPT_TIMEOUT        => 20,
    CURLOPT_MAXREDIRS      => 10,
    CURLOPT_HEADER         => true,

); 

$ch = curl_init( $url ); 
curl_setopt_array( $ch, $options ); 
$content = curl_exec( $ch ); 

$dom = new simple_html_dom();
$dom->load($content,true,false);

echo '<pre>';
print_r($dom);
echo '</pre>';