Question

我正在尝试在一个小型的私人票务系统中处理邮件，该系统会自动将URL解析为可点击的链接，而不会弄乱任何可能发布的HTML。到目前为止，解析URL的功能运作良好，但是系统的一个或两个用户希望能够发布嵌入的图像而不是附件。

这是将字符串转换为可点击的网址的现有代码，请注意我对正则表达式知之甚少，并依赖其他人的帮助来构建此

    $text = preg_replace(
     array(
       '/(^|\s|>)(www.[^<> \n\r]+)/iex',
       '/(^|\s|>)([_A-Za-z0-9-]+(\\.[A-Za-z]{2,3})?\\.[A-Za-z]{2,4}\\/[^<> \n\r]+)/iex',
       '/(?(?=<a[^>]*>.+<\/a>)(?:<a[^>]*>.+<\/a>)|([^="\']?)((?:https?):\/\/([^<> \n\r]+)))/iex'
     ),  
     array(
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\3':'\\0'))",
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"http://\\2\" target=\"_blank\">\\2</a>&nbsp;\\4':'\\0'))",
       "stripslashes((strlen('\\2')>0?'\\1<a href=\"\\2\" target=\"_blank\">\\3</a>&nbsp;':'\\0'))",
     ), $text);

    return $text;

如何修改现有函数（例如上面的函数），以排除包含在<img等HTML标记中的匹配，而不会损害它的功能。

示例：

`<img src="https://example.com/image.jpg">`

变成

`<img src="<a href="https://example.com/image.jpg" target="_blank">example.com/image.jpg</a>">`

我在发帖前做了一些搜索，我最热门的点击是

显然，共同的趋势是＆＃34;这是错误的做法＆＃34;这显然是正确的 - 但是，虽然我同意，但我也希望保持功能很轻松。该系统在组织内私下使用，我们只希望使用此功能自动处理img代码和网址。其他所有内容都是清楚的，没有列表，代码标签引用等。

非常感谢您的帮助。

要点： 如何修改现有的正则表达式规则集，以排除在文本块中找到的img或其他html标记中找到的匹配项。

Answer 1

从我可以从\e修饰符错误中收集到的内容，您的php版本最多只能是 PHP5.4 。 preg_replace_callback()可以从 PHP5.4 获得，因此可能会紧张！

虽然我不希望喜欢通过大量的答案编辑进行大量的反复播放，但我想给你一些牵引力。

我要遵循的方法肯定不是我的职业生涯。正如问题中的评论和许多页面中所述，很多关于SO-HTML的页面都不应该被REGEX解析。（免责声明完整）

PHP5.4.34 Demo Link＆amp; Regex Pattern Demo Link

$text='This has an img tag <img src="https://example.com/image.jpg"> that should be igrnored.
This is an img that needs to become a tag: https://example.com/image.jpg.
This is a <a href="https://www.example.com/image" target="_blank">tagged link</a> with target.
This is a <a href="https://example.com/image?what=something&when=something">tagged link</a> without target.
This is an untagged url http://example.com/image.jpg.
(Please extend this battery of test cases to isolate any monkeywrenching cases)
Another short url example.com/
Another short url example.com/index.php?a=b&c=d
Another www.example.com';
$pattern='~<(?:a|img)[^>]+?>(*SKIP)(*FAIL)|(((?:https?:)?(?:/{2})?)(w{3})?\S+(\.\S+)+\b(?:[?#&/]\S*)*)~';
function taggify($m){
    if(preg_match('/^bmp|gif|png|je?pg/',$m[4])){  // add more filetypes as needed
        return "<img src=\"{$m[0]}\">";
    }else{
        //var_export(parse_url($m[0]));  // if you need to do preparations, consider using parse_url()
        return "<a href=\"{$m[0]}\" target=\"_blank\">{$m[0]}</a>";
    }
}
$text=preg_replace_callback($pattern,'taggify',$text);
echo $text;

输出：

This has an img tag <img src="https://example.com/image.jpg"> that should be igrnored.
This is an img that needs to become a tag: <img src="https://example.com/image.jpg">.
This is a <a href="https://www.example.com/image" target="_blank">tagged link</a> with target.
This is a <a href="https://example.com/image?what=something&when=something">tagged link</a> without target.
This is an untagged url <img src="http://example.com/image.jpg">.
(Please extend this battery of test cases to isolate any monkeywrenching cases)
Another short url <a href="example.com/" target="_blank">example.com/</a>
Another short url <a href="example.com/index.php?a=b&c=d" target="_blank">example.com/index.php?a=b&c=d</a>
Another <a href="www.example.com" target="_blank">www.example.com</a>

SKIP-FAIL技术可以“取消”不受欢迎的比赛。符合条件的匹配将由|

之后的管道（(*SKIP)(*FAIL)）后面的模式部分表示

PHP - 在忽略所有HTML标记的情况下解析邮件中的网址

1 个答案: