PHP简单HTML DOM计算错误的元素数量

时间:2014-08-26 19:10:02

标签: php parsing dom curl simple-html-dom

使用这段代码我想用class" level3"来计算元素数量(dt)。在某个节点:

include_once('simple_html_dom.php');
ini_set("memory_limit", "-1");
ini_set('max_execution_time', 1200);

function parseInit($url) {
  $ch = curl_init();
  $timeout = 0;
  curl_setopt($ch, CURLOPT_URL, $url);
  curl_setopt($ch, CURLOPT_SSL_VERIFYPEER, FALSE);     
  curl_setopt($ch, CURLOPT_SSL_VERIFYHOST, 2); 
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
  curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
  $data = curl_exec($ch);
  curl_close($ch);
  return $data;
}

$data = parseInit("https://www.smile-dental.de/index.php");
$html = new simple_html_dom();
$html = $html->load($data);
$struct = $html->find("dt.level1", 0)->next_sibling()->find("dt.level2", 0)->next_sibling()->find("dt.level3");
echo count($struct);
$html->clear();  
unset($html);

但结果我遇到了这样的问题。实际结果应该是2,但是我得到53(带有类&#34的DT元素的总计数; level3和#34;进入第一个具有类" level1和#34的DT节点)。你能帮我解释一下问题是什么吗?

提前致谢!

--- --- EDIT 一般来说,我想创建链接的层次结构(左侧导航栏)。我写了这样的功能。但它的工作是错误的,也许是因为上面写的情况。但也许在代码中还有其他问题。

function get_links($struct) {
    static $iter = 1;
    $nav_left_links = $struct->find("dt.level".$iter);
    echo "<ul>";   
    foreach ($nav_left_links as $links) {
        echo "<li>".$links->find("a", 0)->href;
        echo str_pad('',4096)."\n";
        ob_flush();
        flush();
        usleep(500000);
        $iter++;
        if ($links->next_sibling() && count($links->next_sibling()->find("dt")) > 0) {
            get_links($links->next_sibling());
        } else {
            $iter--;
            if ($key == count($nav_left_links)) {
                break;
            } else {
                continue;   
            }
        }
        echo "</li>";  
    }
    echo "</ul>";
    $iter--;
}

$data = parseInit("https://www.smile-dental.de/index.php");
$html = new simple_html_dom();
$html = $html->load($data);
$struct = $html->find(".mod_vertical_dropmenu_142_inner", 0);
get_links($struct);
$html->clear();  
unset($html); 

或者如果有人知道如何在没有PHP Simple HTML DOM的情况下重写这段代码,使用经典的解析方法,我将非常感激。

1 个答案:

答案 0 :(得分:0)

不幸的是,您似乎发现了一个错误。我做了一些实验,即使纠正了验证错误,simple-html-dom也无法正确遍历dldtdd元素。当我使用正则表达式将所有dl元素转换为ul,将dddt元素转换为li时,我确实让它工作了:

$html->find("li.level1", 1)->find("li.level2", 1)->find("li.level3");

的结果
<li class="level3 off-nav-321-8120 notparent first"><span class="outer"> <span class="inner"> <a href="/index.php?option=com_virtuemart&amp;view=productdetails&amp;virtuemart_category_id=321&amp;virtuemart_product_id=8120"><span>Pro-Seal Versiegeler</span></a> </span> </span></li>
<li class="level3 off-nav-321-8120 notparent first"></li>
<li class="level3 off-nav-321-8122 notparent last"><span class="outer"> <span class="inner"> <a href="/index.php?option=com_virtuemart&amp;view=productdetails&amp;virtuemart_category_id=321&amp;virtuemart_product_id=8122"><span>Pro-Seal L.E.D. Versiegeler</span></a> </span> </span></li>
<li class="level3 off-nav-321-8122 notparent last"></li>