mysql regexp用于匹配包含非锚标签的字段和包含模式的href属性

时间:2014-04-28 10:38:09

标签: mysql regex

我试图找到数据库中的所有行,其字段具有非锚标记,其href属性以{clickurl}字符串开头。例如,这个 -

<link foo="bar" href="{clickurl}http://wwww.google.com" ...

或者这个(因为它有一个符合标准的非锚标签) -     HTTP://wwww.google.com" ...     HTTP://wwww.google.com" ...

但不是这个(因为它是锚标签) -     HTTP://wwww.google.com" ...

到目前为止我做了什么

使用以下正则表达式,我能够获得所有记录,其中链接标记具有以{clickurl}开头的href属性 -

SELECT bannerid FROM ox_banners WHERE htmltemplate REGEXP "<link[^>]*href\s*=\s*[\"'][^>]*{clickurl}(.*)[\"']"

但是,因为我不仅需要搜索链接标签,还需要搜索任何其他标签(不包括锚标签),我将正则表达式修改为 -

SELECT bannerid FROM ox_banners WHERE htmltemplate REGEXP "<[!a][^>]*href\s*=\s*[\"'][^>]*{clickurl}(.*)[\"']"

但这也是返回锚标签包含此模式的行。

更新

使用zx81的输入,我现在使用此表达式<[^a][^>]*href[[:space:]]*=[[:space:]]*[\"'][^>]*{clickurl}(.*)[\"'],并且在正常情况下只有非锚标记匹配,但是在如下情况下,当href属性位于echo语句内的标记上时在PHP标记内,它也匹配(不需要),因为它实际上是一个锚标记的href -

<?php

$GLOBALS['test'] = '{clickurl}tel://test';

echo '<a href="{clickurl}test">Test</a>';

?>

我仍然在寻找这个解决方案。

2 个答案:

答案 0 :(得分:2)

试试这个:

SELECT bannerid FROM ox_banners WHERE htmltemplate REGEXP ".*<[^a][^>]*href=\"\\{clickurl\\}.*";


Options: Case insensitive; Regex syntax only
Match any single character that is NOT a line break character (line feed) «.*»
   Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»
Match the character “<” literally «<»
Match any single character that is NOT present in the list below and that is NOT a line break character (line feed) «[^a]»
   The literal character “a” (case insensitive) «a»
Match any single character that is NOT present in the list below and that is NOT a line break character (line feed) «[^>]*»
   Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»
   The literal character “>” «>»
Match the character string “href="” literally (case insensitive) «href="»
Match the character “{” literally «\{»
Match the character string “clickurl” literally (case insensitive) «clickurl»
Match the character “}” literally «\}»
Match any single character that is NOT a line break character (line feed) «.*»
   Between zero and unlimited times, as few or as many times as needed to find the longest match in combination with the other quantifiers or alternatives «*»

答案 1 :(得分:1)

请尝试使用此正则表达式:

< *[^a][^>]+ *href *= *"{clickurl}

你快到了。看起来你有一个小错字:你有[!a]而不是[^a]表示&#34;一个字符不是&#34; a&#34;。

[^a][^>]几乎相同。我相信你知道这一点,但在这两种情况下,^表示&#34;不是&#34;,所以[^>]是任何不是>的字符

如果您不仅要允许空格字符而不允许其他类型的空格,而不是*,则可以使用[[:space:]]*

感谢Tuga提醒我\s在MySQL中不起作用:它匹配文字&#34; s&#34;。我有&#34;间隔&#34;在这一个。 :)