使用php提取网站html数据

时间:2015-04-04 18:11:00

标签: php curl

我使用此代码摘录网站 https://billing.te.eg/Arabic/BillStatus.aspx?Acc=A4000917512

但是我有一个错误Fatal error: Call to a member function find() on a non-object in index.php 以及我想要在<span id="SpanPhoneNumber" dir="ltr">02-26981106</span><span id="SpanCurrentBalance">19.30</span>

等跨度元素之间提取它
include_once("simple_html_dom.php");
//use curl to get html content
function getHTML($url,$timeout)
{
       $ch = curl_init($url); // initialize curl with given url
       curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set  useragent
       curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
       curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
       curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute
       curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
       return @curl_exec($ch);
}
$html=getHTML("https://billing.te.eg/Arabic/BillStatus.aspx?Acc=A4000917512",10);
// Find all images on webpage
foreach($html->find("img") as $element)
echo $element->src . '<br>';

// Find all links on webpage
foreach($html->find("a") as $element)
echo $element->href . '<br>';

1 个答案:

答案 0 :(得分:0)

你忘了用 str_get_html() 函数来自sipmle_dom_html库的$ html变量:

$response=getHTML("https://billing.te.eg/Arabic/BillStatus.aspx?Acc=A4000917512",10);
$html = str_get_html($response);

使用https页面时,您可能还需要在函数内部使用:

curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);