Zend_Dom_Query检查schema.org标记

时间:2013-07-25 12:25:43

标签: zend-framework schema.org zend-dom-query

我想检查网站是否包含schema.org标记?我正在做以下事情:

$domain = 'http://agents.allstate.com/william-leahy-mount-prospect-il.html';            
$client = new Zend_Http_Client();
            $client->setUri($domain);
            $response = $client->request();
            $html = $response->getBody();
            $dom = new Zend_Dom_Query($html);
            $resultSchema = $dom->query('body');

            foreach($resultSchema as $r){
                $data = $r->hasAttribute('itemprop');
                if($data)
                    echo 'YEs';
                else 
                    echo 'No';
            }

我不明白如何找到这个。这是正确的做法吗? 网站上使用的schema.org标记可以使用任何html元素。如何查询所有元素并找到包含schema.org标记的元素?

1 个答案:

答案 0 :(得分:0)

经过长时间的搜索和阅读才能得到答案!如果有人仍在寻找答案,这就是它的完成方式。

$seperator = '|'; $dbData = '';
$domain = 'http://agents.allstate.com/william-leahy-mount-prospect-il.html';            
$client = new Zend_Http_Client();
$client->setUri($domain);
$response = $client->request();
$html = $response->getBody();
$dom = new Zend_Dom_Query($html);
$result = $dom->queryXpath('//*[@itemtype="http://schema.org/LocalBusiness"]');
            if($result->count()){
                foreach ($result as $r) {
                    if($r->hasChildnodes()) {
                        $lbHtml = $r->C14N();

                        $dom2 = new Zend_Dom_Query($lbHtml);
                        $lbname = $dom2->queryXpath('//*[@itemprop="name"]');
                        if($lbname->count()){
                            foreach ($lbname as $name) {
                                $name = $name->nodeValue;
                            }
                        }
                    }
                }
            }

            if(isset($name))
                $dbData .= 'name:'.$name.$seperator;
            else 
                $dbData .= 'name:'.$seperator;

            $result = $dom->queryXpath('//*[@itemtype="http://schema.org/PostalAddress"]');
            if($result->count()){
                foreach ($result as $r) {
                    $address = $r->nodeValue;
                }
            }

            if(isset($address))
                $dbData .= 'address:'.$address.$seperator;
            else
                $dbData .= 'address:'.$seperator;

            $result = $dom->queryXpath('//*[@itemprop="telephone"]');
            if($result->count()){
                foreach ($result as $r) {
                    $telephone = $r->nodeValue;
                }
            }

            if(isset($telephone))
                $dbData .= 'telephone:'.$telephone.$seperator;
            else
                $dbData .= 'telephone:'.$seperator;

            $dbData = trim($dbData,'|');

$ dbData将包含包含schema.org数据的所有属性的字符串。 希望它有所帮助!