PHP简单的HTML DOM解析器错误

时间:2017-09-14 12:59:51

标签: php html

我已经开始为一个也会有爬虫的网站编写一个scraper,因为我需要浏览一些链接,但是我收到了这个错误:

  

PHP致命错误:未捕获错误:调用成员函数find()   在D:\ Projekti \ hemrank \ simple_html_dom.php中为null:1129堆栈跟踪:

     

0 D:\ Projekti \ hemrank \ scrapeit.php(37):simple_html_dom-> find('ul')

     

1 D:\ Projekti \ hemrank \ scrapeit.php(19):ScrapeIt-> getAllAddresses()

     

2 D:\ Projekti \ hemrank \ scrapeit.php(55):ScrapeIt-> run()

     

3 {main}在第1129行的D:\ Projekti \ hemrank \ simple_html_dom.php中抛出

当我var_dump $ html变量时,我得到了包含所有标签等的完整html,这就是为什么我对它说“在null上调用成员函数find()”时很奇怪,当它实际上有值时$ HTML。以下是代码中无法正常工作的部分:

        $html = new simple_html_dom();
        $html->load_file($baseurl);
        if(empty($html)){echo "HTTP Response not received!<br/>\n";exit;}
        $links = array();
        foreach ($html->find('ul') as $ul) {
            if(!empty($ul) && (count($ul)>0))
            foreach ($ul->find('li') as $li) {
                if(!empty($li) && (count($li)>0))
                foreach ($li->find('a') as $a) {
                    $links[] = $a->href;
                }
                else
                    die("NOT AVAILABLE");
            }
        }

        return $links;

    }

这是PHP简单HTML DOM解析器的常见问题,是否有解决方案或者我应该切换到其他类型的抓取?

2 个答案:

答案 0 :(得分:0)

我刚刚搜索了你正在使用的lib,这是第1129行:

<referenceContainer name="page.wrapper">
            <container name="extra.product.view" htmlTag="div" htmlClass="extra.product.view" >
                <container name="extra.product.view.container" htmlTag="div" htmlClass="custom-product-detail-section">

                </container>
            </container>
        </referenceContainer> 


<move element="product.info.details" destination="extra.product.view.container"  />

因此,您的错误消息告诉您该类中的<h1>Računi (2017)</h1> <div class="ui form" style="padding: 20px"> <div class="ui stackable equal width grid"> <div class="row"> <div class="column"> <div class="field"> <label>Št. računa</label> <div class="ui left icon input"> <input type="text" name="racun_id" placeholder="št. računa"> <i class="hashtag icon"></i> </div>z </div> </div> <div class="column"> <div class="field"> <label>Ime in priimek</label> <div class="ui left icon input"> <input type="text" placeholder="Vnesi..."> <i class="user icon"></i> </div> </div> </div> <div class="column"> <div class="field"> <label>Naslov</label> <div class="ui left icon input"> <input type="text" name="kupec_naslov" placeholder="Vnesi..."> <i class="marker icon"></i> </div> </div> </div> <div class="column"> <div class="field"> <label>Začetni datum</label> <div class="ui left icon right labeled input"> <i class="calendar icon"></i> <input type="text" name="zac" value="14.9"> <div class="ui basic label"> 2017 </div> </div> </div> </div> <div class="column"> <div class="field"> <label>Končni datum</label> <div class="ui left icon right labeled input"> <i class="calendar icon"></i> <input type="text" name="kon" value="14.9"> <div class="ui basic label"> 2017 </div> </div> </div> </div> </div> <div class="row"> <div class="column"> <div class="field"> <label>Referent</label> <div class="ui selection dropdown"> <input type="hidden" name="referent_id"> <i class="dropdown icon"></i> <div class="default text">Izberi...</div> <div class="menu"> <div class="item" data-value="1">Male</div> <div class="item" data-value="0">Female</div> <div class="item" data-value="1">Spaceman</div> <div class="item" data-value="0">Spiderman</div> </div> </div> </div> </div> <div class="column"> <div class="field"> <label>Odprt račun</label> <select class="ui dropdown"> <option value="">--</option> <option value="N">Odprt TRR</option> <option value="C">Odprt Plačilna kartica</option> </select> </div> </div> <div class="column"> <div class="field"> <label>Filter plačilnih kartic</label> <div class="ui selection dropdown"> <i class="payment icon"></i> <input type="hidden" name="referent_id"> <i class="dropdown icon"></i> <div class="default text">Izberi...</div> <div class="menu"> <div class="item" data-value="">--</div> <div class="item" data-value="isicvisa">ISIC Visa</div> <div class="item" data-value="maestro">Maestro / BA</div> <div class="item" data-value="visa">Visa</div> <div class="item" data-value="mc">MasterCard</div> <div class="item" data-value="karanta">Karanta</div> <div class="item" data-value="diners">Diners</div> <div class="item" data-value="amex">American Express</div> </div> </div> </div> </div> <div class="column"> <div class="field"> <label>Zaključeni odprti računi</label> <div class="ui slider checkbox"> <input type="checkbox" name="newsletter"> <label> </label> </div> </div> </div> <div class="column"> <div class="field"> <label>Izpiši račune:</label> <button class="fluid ui primary button"><i class="terminal icon"></i>Izpis</button> </div> </div> </div> </div> </div> return $this->root->find($selector, $idx, $lowercase); ,因此不存在$this->root方法!

我没有关于lib的专家,因为我使用了很棒的DOMDocument来解析HTML,但希望这可以帮助你理解发生了什么。

此外,null在您的代码中永远不会为空,您在实例化时已经填充了它!

答案 1 :(得分:0)

我建议进行以下更改:

$html->load_file($baseurl);$html = file_get_html($baseurl);

在我的VPS服务器上,它与$html->load_file($baseurl);配合使用,但在我的专用本地服务器上,它仅适用于$html = file_get_html($baseurl);

这解决了我的问题: - 在null上调用成员函数find() - 第1129行的simple_html_dom.php