Question

我使用Symfony DomCrawler来获取文档中的所有文本。

$this->crawler->filter('p')->each(function (Crawler $node, $i) {
    // process text
});

我试图收集<body>内元素之外的所有文字。

<body>
    This is an example
    <p>
        blablabla
    </p>
    another example
    <p>
        <span>Yo!</span>
        again, another piece of text <br/>
        with an annoy BR in the middle
    </p>
</body>

我使用的是PHP Symfony，可以使用XPath（首选）或RegEx。

Answer 1

使用这个简单的XPath可以获得整个文档的字符串值：

string(/)

文档中的所有文本节点都是：

//text()

body的直接文本节点子节点为：

/body/text()

请注意，选择文本节点的XPath通常会转换为连接字符串值，具体取决于上下文。

如何在HTML文档中获取所有TEXT外部元素

1 个答案: