我知道我们可以使用PHP DOM来使用PHP解析HTML。我在Stack Overflow上发现了很多问题。但我有一个特定的要求。我有一个HTML内容,如下所示
<p class="Heading1-P">
<span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 3</span>
</p>
我想解析上面的HTML并将内容保存到两个不同的数组中,如:
$heading
和$content
$heading = array('Chapter 1','Chapter 2','Chapter 3');
$content = array('This is chapter 1','This is chapter 2','This is chapter 3');
我可以简单地使用jQuery实现这一点。但我不确定,如果这是正确的方式。 如果有人能指出我正确的方向,那就太好了。 提前谢谢。
答案 0 :(得分:13)
我使用了domdocument和domxpath来获得解决方案,你可以在以下网址找到它:
<?php
$dom = new DomDocument();
$test='<p class="Heading1-P">
<span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 3</span>
</p>';
$dom->loadHTML($test);
$xpath = new DOMXpath($dom);
$heading=parseToArray($xpath,'Heading1-H');
$content=parseToArray($xpath,'Normal-H');
var_dump($heading);
echo "<br/>";
var_dump($content);
echo "<br/>";
function parseToArray($xpath,$class)
{
$xpathquery="//span[@class='".$class."']";
$elements = $xpath->query($xpathquery);
if (!is_null($elements)) {
$resultarray=array();
foreach ($elements as $element) {
$nodes = $element->childNodes;
foreach ($nodes as $node) {
$resultarray[] = $node->nodeValue;
}
}
return $resultarray;
}
}
答案 1 :(得分:7)
尝试查看PHP Simple HTML DOM Parser
它具有类似于jQuery的出色语法,因此您可以轻松地按ID或类
选择任何您想要的元素// include/require the simple html dom parser file
$html_string = '
<p class="Heading1-P">
<span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 3</span>
</p>';
$html = str_get_html($html_string);
foreach($html->find('span') as $element) {
if ($element->class === 'Heading1-H') {
$heading[] = $element->innertext;
}else if($element->class === 'Normal-H') {
$content[] = $element->innertext;
}
}
答案 2 :(得分:2)
这是使用 DiDOM
解析 html 的另一种方法,它在速度和内存占用方面提供了显着的 better performance。
composer require imangazaliev/didom
<?php
use DiDom\Document;
require_once('vendor/autoload.php');
$html = <<<HTML
<p class="Heading1-P">
<span class="Heading1-H">Chapter 1</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 1</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 2</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 2</span>
</p>
<p class="Heading1-P">
<span class="Heading1-H">Chapter 3</span>
</p>
<p class="Normal-P">
<span class="Normal-H">This is chapter 3</span>
</p>
HTML;
$document = new Document($html);
// find chapter headings
$elements = $document->find('.Heading1-H');
$headings = [];
foreach ($elements as $element) {
$headings[] = $element->text();
}
// find chapter texts
$elements = $document->find('.Normal-H');
$chapters = [];
foreach ($elements as $element) {
$chapters[] = $element->text();
}
echo("Headings\n");
foreach ($headings as $heading) {
echo("- {$heading}\n");
}
echo("Chapter texts\n");
foreach ($chapters as $chapter) {
echo("- {$chapter}\n");
}
答案 3 :(得分:1)
您可以选择使用DOMDocument和DOMXPath。他们确实需要一点曲线来学习,但是一旦你这样做,你会对你能达到的目标感到非常满意。
在php.net中阅读以下内容
http://php.net/manual/en/class.domdocument.php
http://php.net/manual/en/class.domxpath.php
希望这有帮助。
答案 4 :(得分:-7)
//从URL或文件
创建DOM$html = file_get_html('http://www.google.com/');
//查找所有图片
foreach($html->find('img') as $element)
echo $element->src . '<br>';
//查找所有链接
foreach($html->find('a') as $element)
echo $element->href . '<br>';