Question

我正在使用简单的html解析html并删除页面菜单和页脚（例如，我选择http://codex.buddypress.org/developer-docs/the-bp-global/，然后可能是其他网址。）。但我的代码返回Fatal error: Call to a member function find() on a non-object，哪里错了？感谢名单。

require('simple_html_dom.php');
$webch = curl_init();
curl_setopt($webch, CURLOPT_URL, "http://codex.buddypress.org/developer-docs/the-bp-global/");
curl_setopt($webch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($webch, CURLOPT_USERAGENT, 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.2) Gecko/20090729 Firefox/3.5.2 GTB5');
$htmls = curl_exec($webch);
curl_close($webch);
$html = str_get_html($htmls);
$html = preg_replace('#<div(.*?)id="(.*?)head(.*?)"(.*?)>.*</div>#is', '', $html);
$html = preg_replace('#<div(.*?)class="(.*?)head(.*?)"(.*?)>.*</div>#is', '', $html);
$html = preg_replace('#<div(.*?)id="(.*?)menu(.*?)"(.*?)>.*</div>#is', '', $html);
$html = preg_replace('#<div(.*?)class="(.*?)menu(.*?)"(.*?)>.*</div>#is', '', $html);
$html = preg_replace('#<div(.*?)id="(.*?)foot(.*?)"(.*?)>.*</div>#is', '', $html);
$html = preg_replace('#<div(.*?)class="(.*?)foot(.*?)"(.*?)>.*</div>#is', '', $html);
foreach($html->find('a') as $element){
   echo $element.'<hr />';
}

Answer 1

str_get_html似乎是来自HTML DOM Parser的函数。它返回的只是一个字符串，而这正是你所要的。 preg_replace期望一个字符串作为输入并返回一个字符串，然后将其设置为$html。

您的问题是您正在调用$html->find，这意味着您希望$html成为str_get_html返回的对象，但不是因为您只是将它分配给一个字符串，由preg_replace返回。

你可能想要的是这两件事之一：

在执行preg_replace之前，执行字符串处理（使用$html = str_get_html($htmls);）。在该语句之后，它不再是一个字符串，你所做的任何处理都将是无用的和错误的。
使用您正在使用的库中提供的实际工具执行任何操作（简单的HTML DOM Parser，据Google所知）。例如$html->find('div.menu')->class = '';之类的东西。

我会推荐第二点（如果它是你想要的），因为HTML processing using regular expressions is not a really good idea。

php preg_replace html菜单，页脚

1 个答案: