可能重复:
Preg_match_all <a href
How to parse and process HTML with PHP?
我使用curl来提取页面的来源,需要从curl输出中提取一些值。
部分输出如下:
<div class="detailInfo">
<label>Manufacturer code/Gas council no:
</label>BKSWX5506</div>
<div class="detailInfo"></div>
<div class="detailInfo">
<div>
<label>Retail price:</label><span>£12.30</span>
</div>
<div>
<label>Net buying price:</label><span>£7.47</span>
</div>
</div>
从那个输出中,我需要在“制造商代码/气体理事会编号:”之后得到代码,并且两个价格都在单独的字符串中。
任何人都可以帮我吗?
谢谢:)
答案 0 :(得分:1)
试试这个:
<?php
$output = '<div class="detailInfo">
<label>Manufacturer code/Gas council no:
</label>BKSWX5506</div>
<div class="detailInfo"></div>
<div class="detailInfo">
<div>
<label>Retail price:</label><span>£12.30</span>
</div>
<div>
<label>Net buying price:</label><span>£7.47</span>
</div>
</div>';
$outputArray = explode("</label>", str_replace("<label>","</label>",strip_tags($output, '<label>')));
echo "<pre>";
print_r($outputArray);
echo "</pre>";
exit;
?>
输出:
Array
(
[0] =>
[1] => Manufacturer code/Gas council no:
[2] => BKSWX5506
[3] => Retail price:
[4] => £12.30
[5] => Net buying price:
[6] => £7.47
)
答案 1 :(得分:0)
以下是一个通用例程,您可以使用该例程来获取您正在寻找的文本部分的一些xpath。它应该为您提供第一个启动,因为它还显示了如何运行xpath查询:
$searches = array('BKSWX5506', '£12.30', '£7.47');
$doc = new DOMDocument();
$doc->loadHTML('<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">'.$html);
$xp = new DOMXPath($doc);
foreach($searches as $search)
{
$expression = '//text()[contains(., "'.$search.'")]';
$result = $xp->query($expression);
foreach($result as $found)
{
/* @var $found DOMNode */
printf("%s: %s\n", $found->getNodePath(), $found->nodeValue);
}
}
对于您提供的$html
内容,它会执行以下输出:
/html/body/div[1]/text()[2]: BKSWX5506
/html/body/div[3]/div[1]/span/text(): £12.30
/html/body/div[3]/div[2]/span/text(): £7.47
使用这些路径会再次显示信息:
$number = $xp->evaluate('string(/html/body/div[1]/text()[2])'); # BKSWX5506
正如您所看到的,您可以对两者进行xpath:分析文档以获取特定值,然后将收集的信息用作模式。