Question

我的HTML代码如下

<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>

我想要做的是提取'我想要这个文本'，留下所有其他元素。我已经尝试了以下几个迭代，但没有一个返回我需要的文本：

$name = trim($page->find('span[class!=ignore^] a[class!=also^] span[class=phone]',0)->innertext);

由于过滤器上的simple_html_dom部分非常简单，因此可以理解一些指导。

Answer 1

如何使用php preg_match（http://php.net/manual/en/function.preg-match.php）

尝试以下方法：

<?php

$html = <<<EOF
<span class="phone">
i want this text
<span class="ignore-this-one">01234567890</span>
<span class="ignore-this-two" >01234567890</span>
<a class="also-ignore-me">some text</a>
</span>;
EOF;

$result = preg_match('#class="phone".*\n(.*)#', $html, $matches);

echo $matches[1];

?>

正则表达式解释说：找到文本 class =“phone”然后继续直到行尾，使用*。匹配任何字符。然后使用 \ n 切换到新行，并通过将*。括在括号中来抓取该行上的所有内容。

返回的结果存储在数组$ matches中。 $ matches [0]保存从整个正则表达式返回的值，而$ matches [1]保存由右括号返回的值。

简单的HTML DOM - 如何忽略嵌套元素？

1 个答案: