我在snipplr中发现了这个函数,它使用某些属性来获取ra div。我试图使用它,但它没有用。我使用它的方式有问题吗?
http://snipplr.com/view.php?codeview&id=20987
function get_tag( $attr, $value, $xml, $tag=null ) {
if( is_null($tag) )
$tag = '\w+';
else
$tag = preg_quote($tag);
$attr = preg_quote($attr);
$value = preg_quote($value);
$tag_regex = "/<(".$tag.")[^>]*$attr\s*=\s*".
"(['\"])$value\\2[^>]*>(.*?)<\/\\1>/"
preg_match_all($tag_regex,
$xml,
$matches,
PREG_PATTERN_ORDER);
return $matches[3];
}
我对它进行了更改,将其用于这样的网址:
function get_tag( $attr, $value, $page, $tag=null ) {
if( is_null($tag) )
$tag = '\w+';
else
$tag = preg_quote($tag);
$attr = preg_quote($attr);
$value = preg_quote($value);
$tag_regex = "/<(".$tag.")[^>]*$attr\s*=\s*".
"(['\"])$value\\2[^>]*>(.*?)<\/\\1>/";
$page = file_get_contents($page);
preg_match_all($tag_regex,
$page,
$matches,
PREG_PATTERN_ORDER);
return $matches[3];
}
get_tag("class","weather","http://www.masrawy.com","div");
我该如何正确使用?
答案 0 :(得分:2)
不要使用正则表达式。使用可以解析和查询DOM的内容,例如DOMDocument
,Zend_Dom_Query
或SimpleHTMLDOM
。
DOMDocument示例:
$dom = new DomDocument();
$html = file_get_contents('http://www.masrawy.com');
$dom->loadHTML($html);
$finder = new DomXPath($dom);
$classname="weather";
$nodes = $finder->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");
$extracted = array();
foreach($nodes as $element)
{
// convert to html string
$extracted[] = $element->ownerDocument->saveXML($element);
}
// now iterate over extracted and output...
Zend_Dom_Query示例:
$html = file_get_contents("http://www.masrawy.com");
$dom = new Zend_Dom_Query($html);
$results = $dom->query('div.theCssClassName');
$extracted = array();
foreach($results as $element)
{
// convert to html string
$extracted[] = $element->ownerDocument->saveXML($element);
}
// now iterate over extracted and output...