在很长的文本中找到<p>标签</p>

时间:2012-11-14 11:06:48

标签: php

我有一个非常长的HTML文本,我想在PHP中使用p标记的迭代id值。我原来的字符串:

$mystring="
<p> my very long text with a lot of words ....</p>
<p></p>
<p> my other paragraph with a very long text ...</p>
(...)
";

我想要的结果:

$myparsestring= "
<p id=1>my very long text with a lot of words ....</p>
<p id=2> my other paragraph with a very long text ...</p>
";

如您所见,我可以使用getElementsByTagName ()和正则表达式(可能会被拆分)。

你做这项工作的指导是什么?

2 个答案:

答案 0 :(得分:3)

如果您打算解析html,请尝试使用DOM xpath

这是一个简单的例子:

$xpath = new DOMXPath($html);
$query = '//*/p';
$entries = $xpath->query($query);

不要使用正则表达式,如果你打算做的就是像这样解析html使用这个方法,除非你有使用正则表达式的具体原因

答案 1 :(得分:0)

你可以像这样regex

$mystring="
<p> my very long text with a lot of words ....</p>
<p></p>
<p> my other paragraph with a very long text ...</p>
(...)
";

// This will give you all <p> tags, that have some information in it.
preg_match_all('/<p>(?<=^|>)[^><]+?(?=<|$)<\/p>/s', $mystring, $matches);

$myparsestring = '';
for( $k=0; $k<sizeof( $matches[0] ); $k++ )
{
    $myparsestring .= str_replace( '<p', '<p id='.($k+1), $matches[0][$k] );
}

echo htmlspecialchars( $myparsestring );

输出/结果:

<p id=1> my very long text with a lot of words ....</p>
<p id=2> my other paragraph with a very long text ...</p>