Question

我必须在div标签内使用正则表达式标头解析html文件这是我的html标签，我试图解析

<div class="descriptionArea-2" style="visibility: visible;">
<img src="(image Url Here)" />
<br />
<h2>"Product Title"</h2>
        <div class="displayDescription">"product description here."<div class="icons">icons</div></div>

</div>

我在这一次尝试了很多时间来获得“产品标题”和“产品描述”

Answer 1

我不知道页面是如何通用的，但这些表达式可以起作用：

产品名称：

/<h2>"(.*)"<\/h2>/

描述：

/<div class="displayDescription">"(.*)"<div class="icons">/

可能是一种更通用的描述方式：

/<div class="displayDescription">([^<]*)/

使用preg_match（_all）获取所需的值

preg_match_all('/<h2>"(.*)"<\/h2>/',$string,$matches)
$matches[1][0] //gets the first title

Answer 2

此

的注册表达式

'/<h2>"([^"]*?)"<\/h2>/'

使用函数preg_match_all

你确定标题总是用双引号括起来吗？

您的html代码无效，div没有关闭标记

Answer 3

以下是使用regexp获得所需内容的可能方法：

/<div class="descriptionArea-2"[^>]*>(?: *<[^h][^2][^>]*>\/>)*<h2>([^<]*)<\/h2>[^<]*<div class="displayDescription">([^<]*)</

以上尝试匹配与问题中提供的示例html完全相同的层次结构。根据需要替换类字符串。如果h2和嵌套div标记（具有displayDescription类的标记）的顺序相反，或者其间有任何其他标记，则正则表达式将不起作用。

第一个返回的值是h2文本，第二个是内部div文本。

另一个选择是使用xpath，如果您的html文档格式正确。以下是每个字符串的xpath解决方案：

//div[@class="descriptionArea-2"]/h2/text()

//div[@class="descriptionArea-2"]/div[@class="displayDescription"]/text()

如何使用PHP中的类从字符串中提取div中的标题标记值和特定文本

3 个答案: