我正在尝试删除以下字符串中的所有内容,除了对象标记:
<p>If a post is marked video, and there is text BEFORE the video, the video player does not appear! We only see the actual text for the url…</p>
<p> </p>
<p><object width="584" height="463"><param value="http://www.youtube.com/v/Clp9AeBdgL0?version=3" name="movie"><param value="true" name="allowFullScreen"><param value="always" name="allowscriptaccess"><embed width="584" height="463" allowfullscreen="true" allowscriptaccess="always" type="application/x-shockwave-flash" src="http://www.youtube.com/v/Clp9AeBdgL0?version=3"></object></p>
<p>Of course, you might even have a paragraph AFTER the video. Could be lots and lots of meaningless text – we should definitely limit this. Lorem ipsum</p>
如上所示,第三个'p'标记包含'object'标记。我想摆脱除'object'标签及其内容之外的所有东西。换句话说,我想遍历DOM并删除除了以外的所有内容:
<object width="584" height="463"><param value="http://www.youtube.com/v/Clp9AeBdgL0?version=3" name="movie"><param value="true" name="allowFullScreen"><param value="always" name="allowscriptaccess"><embed width="584" height="463" allowfullscreen="true" allowscriptaccess="always" type="application/x-shockwave-flash" src="http://www.youtube.com/v/Clp9AeBdgL0?version=3"></object>
我能够通过遍历DOM编写一个从字符串中删除任何特定标记(p,img,div等)及其内容的函数,但我无法弄清楚如何保留子内容像这样的标签。有人可以帮忙吗?
答案 0 :(得分:1)
而不是使用XML解析的对象遍历DOM(这听起来像你正在做的,对不起,如果我不正确),我建议只在你的字符串上使用正则表达式类型搜索。 / p>
编辑:
它看起来像'/<object .*<\/object>/'
。您可以测试PHP正则表达式here - 我使用了preg_match()
函数。此外,如果每页有多个<object>
s,则需要确保不使用“贪婪”匹配。最后,这个不会使用嵌套对象,虽然我不希望你有它们。
所以整个片段可能是:
$pattern = '/<object .*<\/object>/';
$subject = /* this is your string containing the html' */
$matches = array();
if(preg_match($pattern, $subject, $matches))
{
echo $matches[0];
}
else
{
echo "No match found."
}