Question

我有以下HTML

<p>Some text <a title="link" href="http://link.com/" target="_blank">my link</a> more 
text <a title="link" href="http://link.com/" target="_blank">more link</a>.</p>
<p>Another paragraph.</p>
<p>[code:cf]</p>
<p>&lt;cfset ArrFruits = ["Orange", "Apple", "Peach", "Blueberry", </p>
<p>"Blackberry", "Strawberry", "Grape", "Mango", </p>
<p>"Clementine", "Cherry", "Plum", "Guava", </p>
<p>"Cranberry"]&gt;</p>
<p>[/code]</p>
<p>Another line</p>
<p><img src="http://image.jpg" alt="Array" />
</p>
<p>More text</p>
<p>[code:cf]</p>
<p>&lt;table border="1"&gt;</p>
<p> &lt;cfoutput&gt;</p>
<p> &lt;cfloop array="#GroupsOf(ArrFruits, 5)#" index="arrFruitsIX"&gt;</p>
<p>  &lt;tr&gt;</p>
<p> &lt;cfloop array="#arrFruitsIX#" index="arrFruit"&gt;</p>
<p>     &lt;td&gt;#arrFruit#&lt;/td&gt;</p>
<p> &lt;/cfloop&gt;</p>
<p>  &lt;/tr&gt;</p>
<p> &lt;/cfloop&gt;</p>
<p> &lt;/cfoutput&gt;</p>
<p>&lt;/table&gt;</p>
<p>[/code]</p>
<p>With an output that looks like:</p>
<p><img src="another_image.jpg" alt="" width="342" height="85" /></p>

我正在尝试做的是编写一个正则表达式，删除所有或，并且只要找到，它就会用线断裂。

到目前为止，我的模式看起来像这样：

/\<p\>(.*?)(<\/p>)/g

我正在用以下内容替换比赛：

$1\n

这一切看起来不错，但它也替换了[code][/code]标签内的内容，在这种情况下根本不应该替换标签，因此，我会lkike得到当代码中的内容不是时，摆脱[code]代码。

我不能正确否定，我知道这将是

的内容

\<p\>^\[code*\](.*?)(<\/p>)

但显然这不起作用： - ）

有人可以帮我借这个正则表达式吗？

BTW，我知道我不应该使用正则表达式解析HTML。我完全清楚这一点，但是，对于这个具体案例，我还是想使用正则表达式。

提前致谢

Answer 1

我假设您对生成您要解析的HTML的应用程序有特殊的了解，否则you would not be even considering regular expressions for the task。（部分原因也是，我认为，知道标记始终出现在换行符之后，结束标记始终出现在换行符之前。）

上面已经说过，你不能轻易或有效地实现你想要用正则表达式实现的目标（你必须使用复杂的嵌套后视和前瞻断言来验证你的...发生不在[code]...[/code]块内，并且非固定长度的后置断言特别有限，并且在JDK1.6之前是IIRC普通的错误。）

您应首先迭代输入序列，将其分解为code和非code块，并将块传输到输出序列中不变（在{{1}的情况下） } chunks）或code - 通过正则表达式或简单的字符串替换应用替换（在非...块的情况下。）

如果您必须（或如何）处理嵌套或不匹配的code块，请由您自己决定。

Answer 2

我知道我不应该经常使用解析HTML的表达式。我完全意识到这一点，但仍然，为这个具体案例，我想用正则表达式。

你能解释一下这个吗？

Answer 3

否定前瞻的语法是(?!)。

(?![code.*?]([^\[]|\[\/[^c]|\[\/c[^o]|\[\/co[^d]|\[\/cod[^e]|\[\/code[^\]])*).*?

正则表达式用换行符替换<p>标签</p>

3 个答案: