Question

我正在使用xpath技术从文章中提取文字。我想要做的是查询文本，如果存在某些标签（注意复数），我想保持标签和html。 另一个解决方案是从xpath查询中检索原始html，我可以通过php处理它。

这是一篇文章的例子：

    <html>
    <body>
        <div id="main">
            <div id="content">
                <p>Some content</p>
                <blockquote>Some blockquote</blockquote>
                <embed src="someembed source"></embed>
                <br/>
            </div>
        </div>
    </body>
    </html>

我正在寻找的是：

一些内容（来自p标签）
<blockquote>Some blockquote</blockquote>
<embed src="someembed source"></embed>
<br/>

我的xpath不是为了处理任何事情而设计的，而是<p>标记。

$xpath = '//div[@id="main"]//div[@id="content]//p';

Answer 1

让我以这种方式解释您的问题：您想要检测//div[@id="main"]//div[@id="content]的所有出现的事件，这些事件也包含您提到的子标记的特定组合。

您可以使用以下XPath表达式选择这些div出现：

//div[@id="main"]//div[@id="content" and p and blockquote and embed and br]

如果你只想要子节点，你也可以写：

//div[@id="main"]//div[@id="content" and p and blockquote and embed and br]/*

在xpath查询中保留某些html或从xpath查询中检索原始html

1 个答案: