Question

我在Rad Soft中写了一个正则表达式，从某个html中提取div标签。它在编辑器和php regex tester中测试得很好。现在我试图让它在我当前的项目上使用php实现，但只是不起作用。通常它是一些简单的解决方案，因为我一直盯着这个问题太久没有注册。

这是使用的测试样本

$temp = '<div>blah blah blah.
    You ordered these items: 
    <div id="product_list">

 - 1  Large -  Target
<span class="pricedisplay">R$ 1,00</span>
- 1  Large - Black
<span class="pricedisplay">R$ 1,00</span>
- 1  Large - Leather
<span class="pricedisplay">R$ 1,00</span>
- 1  Large - Preto
<span class="pricedisplay">R$ 1,00</span>

</div><span class="total-shipping"></span><span class="total-price">Total: <span class="pricedisplay">R$ 1,00</span>
</span></div>';

和正则表达式：

<div\sid="product_list">([\s\w\d<=>/\\\$:",\.-]?)*[^(</div>)]</div>

我在这里尝试使用：

$outputHTML = preg_replace_callback(preg_quote('<div\sid="product_list">([\s\w\d<=>/\\\$:",\.-]?)*[^(</div>)]</div>'), 'findProducts', $temp);

我尝试过/不使用preg_quote。如果你知道任何类似于Rad的软件可以更好地模拟php的正则表达式，那么它会有很多帮助。提前谢谢！

Answer 1

无论这个正则表达式是否与某个测试人员合作，都会有很多错误。

第一：

[\s\w\d<=>/\\\$:",\.-]

等于：

[\s\w\d<=>/$:",.\\-]

你不需要在角色类中转换角色。另外，你使用preg_quote转义：. \ + * ? [ ^ ] $ ( ) { } = ! < > | : -

其次：

[^(</div>)]

意思是：

"
[^(</div>)]    # Match a single character NOT present in the list “(</div>)”
"

虽然您可能想写的是：

(?<!</div>)

"
(?<!         # Assert that it is impossible to match the regex below with the match ending at this position (negative lookbehind)
   </div>       # Match the characters “</div>” literally
)
"

最后：

([\s\w\d<=>/\\\$:",\.-]?)* 等于：[\s\w\d<=>/$:",.\\-]*

哦，是的，你在做什么是有潜在危险的。当您准备好使用您的语言工具时，不应尝试使用正则表达式提取/解析html / xml / xhtml。

Answer 2

PHP正则表达式应该有这样的分隔符：

/hello/分隔符是/ here
#<div\sid="\w+">([^<]+)</div>#分隔符为＃

你可以使用你想要的任何字符，你只需要小心你的正则表达式本身不存在分隔符，或者你应该逃避它。在某些情况下，最好切换到另一个分隔符，例如在/中使用</div>字符的第二个示例中。

我没有深入了解你的正则表达式，但你可以尝试：

减少要测试的文本，删除一些部分，然后再次测试，找出可能导致问题的部分
出于同样的原因减少正则表达式

~~另外，你的文字是多行的，你应该放一个pattern modifier。~~见下面Alan的评论

Answer 3

我认为你需要在正则表达式周围添加适当的分隔符，例如

$outputHTML = preg_replace_callback(
    '#<div\sid="product_list">([\s\w\d<=>/\\\$:",\.-]?)*[^(</div>)]</div>#', 
    'findProducts', 
    $temp);

在示例中，#是分隔符，用于标记正则表达式的开头和结尾。这是必需的，因此PHP可以确定在正则表达式之后是否有任何修饰符（例如#<some regex here>#i其中i是修饰符）。见Delimiters

正则表达式php表达式在测试中有效但在实践中没有

3 个答案: