Question

我使用此代码来提取网站html数据我在span元素之间获取数据时遇到问题例如<span id="SpanPhoneNumber" dir="ltr">02-26981106</span>和<span id="SpanCurrentBalance">19.30</span>

include_once("simple_html_dom.php");
function getHTML($url,$timeout)
{
       $ch = curl_init($url); // initialize curl with given url
       curl_setopt($ch, CURLOPT_USERAGENT, $_SERVER["HTTP_USER_AGENT"]); // set  useragent
       curl_setopt($ch, CURLOPT_RETURNTRANSFER, true); // write the response to a variable
       curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true); // follow redirects if any
       curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout); // max. seconds to execute
       curl_setopt($ch, CURLOPT_FAILONERROR, 1); // stop when it encounters an error
       curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, false);
       curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, false);
       return @curl_exec($ch);
}
$response = getHTML("https://billing.te.eg/Arabic/BillStatus.aspx?Acc=A4000917512",10);
$html = str_get_html($response);

// my problem here - i want get data between span element without loop or array
echo $html->find('span[id=SpanCategory]');

Answer 1

preg_match_all可能有所帮助。我把一个简单的组合在一起，因为你正在使用看似DOM读者/解析器的东西，这可能/可能没有帮助你。

$html_original='<span id="SpanPhoneNumber" dir="ltr">02-26981106</span>';
$html_original .="<p>Some other info here in a paragraph tag</p>";
$html_original .='<span id="SpanCurrentBalance">19.30</span>';

$pattern = '/[0-9-\.]+/';
preg_match_all($pattern, $html_original, $matches);
print_r($matches);

结果。

Array
(
    [0] => Array
        (
            [0] => 02-26981106
            [1] => 19.30
        )

)

同样，它是一个只捕获数字的原始正则表达式，我没有包含对代码中提到的两个Span元素的引用。

最后，您的$ html->find调用引用了我在帖子开头没有看到的ID（您提到过SpanPhoneNumber和SpanCurrentBalance） - 也许这是在＆gt; find命令中使用的ID？

使用php在span元素之间提取网站html数据

1 个答案: