Question

我在“抓取”网站上找到了不同的解决方案，但它们似乎并不是我想要的。

我希望能够从在线目录中的每个文件中提取数据并将结果保存到MySQL数据库中：

http://www.website.com/directory/subdirectory/

该子目录包含几个不同的子子目录，其中包含我正在寻找的信息。

这些子子目录包含以下元素以及我要存储的数据：

<h1 class="title">Title</h1>
<h2 class="details">Details</h2>

然后，2段标签带有附加数据：

<p>Text</p>
<p>More Text</p>

最后：

<h3>Title</h3>
<p>Text</p>

理想情况下，我想将每一段文本存储到数据库中，例如：

$title = //all text between <h1> and </h1>;
$details = //all text between <h2> and </h2>;
$detailText1 = //all text between the FIRST set of <p> and </p>
$detailText2 = //all text between the SECOND set of <p> and </p>
$title2 = //all text between <h3> and </h3>;
$title2Text = //all text between the THIRD set of <p> and </p>;

mysql_query('INSERT INTO table (id, title, details, detailText1, detailText2, title2, title2Text) VALUES (NULL, "'.$title.'", "'.$details.'", "'.detailText1.'", "'.$detailText2.'", "'.$title2.'")');

非常感谢任何帮助。

Answer 1

使用Simple HTML DOM获取h1，h2，.....和任何标签。
例如：

require_once 'simple_html_dom.php';
// Create DOM from URL or file
$html = file_get_html('http://www.wikipedia.org');
// Find all h1
foreach($html->find('h1') as $element) 
       echo $element->outertext. '<br/>';

将指定的字符串保存到目录中所有文件的MySQL数据库中

1 个答案: