如何使用此函数来获取div

时间:2012-01-14 22:03:46

标签: php regex html

我在snipplr中发现了这个函数,它使用某些属性来获取ra div。我试图使用它,但它没有用。我使用它的方式有问题吗?

http://snipplr.com/view.php?codeview&id=20987

function get_tag( $attr, $value, $xml, $tag=null ) {
  if( is_null($tag) )
    $tag = '\w+';
  else
    $tag = preg_quote($tag);

  $attr = preg_quote($attr);
  $value = preg_quote($value);

  $tag_regex = "/<(".$tag.")[^>]*$attr\s*=\s*".
                "(['\"])$value\\2[^>]*>(.*?)<\/\\1>/"

  preg_match_all($tag_regex,
                 $xml,
                 $matches,
                 PREG_PATTERN_ORDER);

  return $matches[3];
}

我对它进行了更改,将其用于这样的网址:

    function get_tag( $attr, $value, $page, $tag=null ) {
  if( is_null($tag) )
    $tag = '\w+';
  else
    $tag = preg_quote($tag);

  $attr = preg_quote($attr);
  $value = preg_quote($value);

  $tag_regex = "/<(".$tag.")[^>]*$attr\s*=\s*".
                "(['\"])$value\\2[^>]*>(.*?)<\/\\1>/";
 $page = file_get_contents($page);
  preg_match_all($tag_regex,
                 $page,
                 $matches,
                 PREG_PATTERN_ORDER);

  return $matches[3];
}


get_tag("class","weather","http://www.masrawy.com","div");

我该如何正确使用?

1 个答案:

答案 0 :(得分:2)

不要使用正则表达式。使用可以解析和查询DOM的内容,例如DOMDocumentZend_Dom_QuerySimpleHTMLDOM

DOMDocument示例:

$dom = new DomDocument();

$html = file_get_contents('http://www.masrawy.com');
$dom->loadHTML($html);

$finder = new DomXPath($dom);
$classname="weather";
$nodes = $finder->query("//div[contains(concat(' ', normalize-space(@class), ' '), ' $classname ')]");

$extracted = array();
foreach($nodes as $element)
{
  // convert to html string
  $extracted[] = $element->ownerDocument->saveXML($element);
}

// now iterate over extracted and output...

Zend_Dom_Query示例:

$html = file_get_contents("http://www.masrawy.com");

$dom = new Zend_Dom_Query($html);
$results = $dom->query('div.theCssClassName');

$extracted = array();
foreach($results as $element)
{
  // convert to html string
  $extracted[] = $element->ownerDocument->saveXML($element);
}

// now iterate over extracted and output...