Question

我在mysql数据库中使用simple_html_dom存储了几页html，就像这样。

scraper.php

<?php
require('simple_html_dom.php');
mysql_connect("localhost", "root", "") or die(mysql_error());
mysql_select_db("dbname") or die(mysql_error());

$url = 'someurl.html'
$html = file_get_html($url);
$html = mysql_real_escape_string($html);
$query = "INSERT INTO tablename (id, file_get_html) VALUES (NULL, '$html')"; 
mysql_query($query);

?>

然后，如果我回显插入数据库中的数据，我会获得精确的抓取页面。

但如果我尝试使用存储在数据库中的html来抓取页面的h1标题，那么它会给我

致命错误：在

中的非对象上调用成员函数find（）

这一行

$h1 = trim($html->find('h1', 0)->plaintext);

此处完整代码

parse_data.php

<?php
    require('simple_html_dom.php');
    mysql_connect("localhost", "root", "") or die(mysql_error());
    mysql_select_db("dbname") or die(mysql_error());

    $result = mysql_query("select file_get_html from tablename where id = 1");
    while ($row = mysql_fetch_assoc($result)){
    $html = $row['file_get_html'];
    }

    $h1 = trim($html->find('h1', 0)->plaintext);
    $title = trim($h1);
    echo $title ;

?>

我这样做，所以每次进行测试时我都不必刮掉远程页面。

如何使用存储在数据库中的simple_html_dom和html数据获取h1标签内的内容？

Answer 1

还有一个名为 str_get_html 的函数在变量中加载字符串html数据并使用simple_html_dom解析它

<?php
    require('simple_html_dom.php');
    mysql_connect("localhost", "root", "") or die(mysql_error());
    mysql_select_db("dbname") or die(mysql_error());

    $result = mysql_query("select file_get_html from tablename where id = 1");
    while ($row = mysql_fetch_assoc($result)){
    $html = str_get_html($row['file_get_html']);
    }

    $h1 = trim($html->find('h1', 0)->plaintext);
    $title = trim($h1);
    echo $title ;

?>

上面注意到，我刚刚更换了

file_get_html

带

str_get_html

为什么simple_html_dom不能处理存储在DB中的html？

1 个答案: