PHP从HTML代码中提取日期如何?

时间:2014-12-08 01:20:06

标签: php html xpath domdocument

我需要一些帮助,从下面的html代码中提取日期,(下面只是我要提取的内容的快照,它是一个完整的html页面)

.... <span class="glyphicon glyphicon-comment" style="color:#ccc;"> </span>&nbsp;
<span style="font-family:'Open Sans', arial;font-size:11px!important;color:#ccc;">0</span>&nbsp;&nbsp;
<span class="glyphicon glyphicon-time" style="color:#ccc;"></span>&nbsp;
<span style="font-family:'Open Sans',arial;font-size:11px!important;color:#ccc;">December 6, 2014</span>
<span style="font-family:'Open Sans',arial;font-size:11px!important;color:#ccc;">2:00 am</span>
<span style="font-family:'Open Sans',arial;font-size:11px!important;color:#ccc;">Hits(6)</span>....

所以我尝试使用以下代码使用PHP DOM对象和XPATH进行查找,但失败,结果长度为零。为什么?

//libxml_use_internal_errors(true);
        $dom_document = new DOMDocument(); // CREATE A NEW DOCUMENT
        $dom_document->loadHTML(
            mb_convert_encoding($row['html'], 'HTML-ENTITIES', 'UTF-8')
        ); // LOAD THE STRING INTO THE DOCUMENT
        $classname = "font-family:'Open Sans',arial;font-size:11px!important;color:#ccc;";
        $xpath     = new DOMXPath($dom_document);
        $results   = $xpath->query("//*[@span=\"" . $classname . "\"]");
            var_dump($results);
        if ($results->length > 0) {
            $date = $results->item(0)->nodeValue;
        }
        //libxml_use_internal_errors(false);

1 个答案:

答案 0 :(得分:1)

您的$classname具有误导性/混淆性,它不包含样本标记内的类名,而是 css样式规则

$classname = "font-family:'Open Sans',arial;font-size:11px!important;color:#ccc;";

您应该搜索具有该规则样式的节点:

$results = $xpath->query("//*[@style=\"" . $classname . "\"]");

Sample Output