我已经开始为一个也会有爬虫的网站编写一个scraper,因为我需要浏览一些链接,但是我收到了这个错误:
PHP致命错误:未捕获错误:调用成员函数find() 在D:\ Projekti \ hemrank \ simple_html_dom.php中为null:1129堆栈跟踪:
0 D:\ Projekti \ hemrank \ scrapeit.php(37):simple_html_dom-> find('ul')
1 D:\ Projekti \ hemrank \ scrapeit.php(19):ScrapeIt-> getAllAddresses()
2 D:\ Projekti \ hemrank \ scrapeit.php(55):ScrapeIt-> run()
3 {main}在第1129行的D:\ Projekti \ hemrank \ simple_html_dom.php中抛出
当我var_dump $ html变量时,我得到了包含所有标签等的完整html,这就是为什么我对它说“在null上调用成员函数find()”时很奇怪,当它实际上有值时$ HTML。以下是代码中无法正常工作的部分:
$html = new simple_html_dom();
$html->load_file($baseurl);
if(empty($html)){echo "HTTP Response not received!<br/>\n";exit;}
$links = array();
foreach ($html->find('ul') as $ul) {
if(!empty($ul) && (count($ul)>0))
foreach ($ul->find('li') as $li) {
if(!empty($li) && (count($li)>0))
foreach ($li->find('a') as $a) {
$links[] = $a->href;
}
else
die("NOT AVAILABLE");
}
}
return $links;
}
这是PHP简单HTML DOM解析器的常见问题,是否有解决方案或者我应该切换到其他类型的抓取?
答案 0 :(得分:0)
我刚刚搜索了你正在使用的lib,这是第1129行:
<referenceContainer name="page.wrapper">
<container name="extra.product.view" htmlTag="div" htmlClass="extra.product.view" >
<container name="extra.product.view.container" htmlTag="div" htmlClass="custom-product-detail-section">
</container>
</container>
</referenceContainer>
<move element="product.info.details" destination="extra.product.view.container" />
因此,您的错误消息告诉您该类中的<h1>Računi (2017)</h1>
<div class="ui form" style="padding: 20px">
<div class="ui stackable equal width grid">
<div class="row">
<div class="column">
<div class="field">
<label>Št. računa</label>
<div class="ui left icon input">
<input type="text" name="racun_id" placeholder="št. računa">
<i class="hashtag icon"></i>
</div>z
</div>
</div>
<div class="column">
<div class="field">
<label>Ime in priimek</label>
<div class="ui left icon input">
<input type="text" placeholder="Vnesi...">
<i class="user icon"></i>
</div>
</div>
</div>
<div class="column">
<div class="field">
<label>Naslov</label>
<div class="ui left icon input">
<input type="text" name="kupec_naslov" placeholder="Vnesi...">
<i class="marker icon"></i>
</div>
</div>
</div>
<div class="column">
<div class="field">
<label>Začetni datum</label>
<div class="ui left icon right labeled input">
<i class="calendar icon"></i>
<input type="text" name="zac" value="14.9">
<div class="ui basic label">
2017
</div>
</div>
</div>
</div>
<div class="column">
<div class="field">
<label>Končni datum</label>
<div class="ui left icon right labeled input">
<i class="calendar icon"></i>
<input type="text" name="kon" value="14.9">
<div class="ui basic label">
2017
</div>
</div>
</div>
</div>
</div>
<div class="row">
<div class="column">
<div class="field">
<label>Referent</label>
<div class="ui selection dropdown">
<input type="hidden" name="referent_id">
<i class="dropdown icon"></i>
<div class="default text">Izberi...</div>
<div class="menu">
<div class="item" data-value="1">Male</div>
<div class="item" data-value="0">Female</div>
<div class="item" data-value="1">Spaceman</div>
<div class="item" data-value="0">Spiderman</div>
</div>
</div>
</div>
</div>
<div class="column">
<div class="field">
<label>Odprt račun</label>
<select class="ui dropdown">
<option value="">--</option>
<option value="N">Odprt TRR</option>
<option value="C">Odprt Plačilna kartica</option>
</select>
</div>
</div>
<div class="column">
<div class="field">
<label>Filter plačilnih kartic</label>
<div class="ui selection dropdown">
<i class="payment icon"></i>
<input type="hidden" name="referent_id">
<i class="dropdown icon"></i>
<div class="default text">Izberi...</div>
<div class="menu">
<div class="item" data-value="">--</div>
<div class="item" data-value="isicvisa">ISIC Visa</div>
<div class="item" data-value="maestro">Maestro / BA</div>
<div class="item" data-value="visa">Visa</div>
<div class="item" data-value="mc">MasterCard</div>
<div class="item" data-value="karanta">Karanta</div>
<div class="item" data-value="diners">Diners</div>
<div class="item" data-value="amex">American Express</div>
</div>
</div>
</div>
</div>
<div class="column">
<div class="field">
<label>Zaključeni odprti računi</label>
<div class="ui slider checkbox">
<input type="checkbox" name="newsletter">
<label> </label>
</div>
</div>
</div>
<div class="column">
<div class="field">
<label>Izpiši račune:</label>
<button class="fluid ui primary button"><i class="terminal icon"></i>Izpis</button>
</div>
</div>
</div>
</div>
</div>
为return $this->root->find($selector, $idx, $lowercase);
,因此不存在$this->root
方法!
我没有关于lib的专家,因为我使用了很棒的DOMDocument来解析HTML,但希望这可以帮助你理解发生了什么。
此外,null
在您的代码中永远不会为空,您在实例化时已经填充了它!
答案 1 :(得分:0)
我建议进行以下更改:
$html->load_file($baseurl);
至$html = file_get_html($baseurl);
在我的VPS服务器上,它与$html->load_file($baseurl);
配合使用,但在我的专用本地服务器上,它仅适用于$html = file_get_html($baseurl);
这解决了我的问题:
- 在null上调用成员函数find()
- 第1129行的simple_html_dom.php