字符串中的带状链接

时间:2014-07-31 08:18:13

标签: php python regex

我正在尝试为那些选择严格版本的用户创建严格的聊天过滤器。我想阻止所有网址,除了一些白名单(youtube,prntscr,facebook等),以防止人们发送色情内容,IP抓取,病毒下载等。

我知道我可以使用一些额外的代码来完成这项工作,但是有没有办法使用正则表达式来实现这一点?我想检查字符串是否包含URL,但URL不是白名单(例如youtube.com)。

我希望在Python和PHP中实现它,但我只需要正则表达式,因为我知道如何在两种语言中使用正则表达式。

由于

编辑:要清楚 - 这是针对聊天系统的严格模式。用户发送的消息可以是“Hello”到“http://unsafelink.com去那里!!”的任何内容。

1 个答案:

答案 0 :(得分:0)

检查下面的代码段

$message = array(
  "Dude I saw her on youtube",
  "I just opened an account on youtuber.com",
  "I'm watching an amazing prank, check this out youtube.com/gfsddfh784",
  "Dude, isn't this girl forbidden.com/hot-chick/123 Mery from our school?",
  "Take a look google.com?search=how%20to%20hack%20a%20wireless",
  "Ask someone on stackoverflow.com :p",
  "I found this great snippet on stackoverflow!",
  "He's all day on xxx.net"
  );

$url = '/(((https?:\/\/)?www)?\.?[a-z0-9]+\.[a-z0-9]+[a-z0-9\-\/?&#%=]+)/';
$whitelist = "/\b(youtube|stackoverflow|google|twitter|facebook|prntscr)\b/";

// check messages like this
foreach ($message as &$line){
  if(preg_match($url, $line, $match)){
    echo $match[0] , preg_match($whitelist, $match[0]) ? " -> Safe" : " -> Unsafe" , '<br />';
  } 
}

echo "<hr />";

// or like this
foreach ($message as &$line){
  if(preg_match($url, $line, $match) && !preg_match($whitelist, $match[0])){
    echo $match[0]  . " -> Unsafe" . '<br />';
  } 
}

输出:

youtuber.com -> Unsafe
youtube.com/gfsddfh784 -> Safe
forbidden.com/hot-chick/123 -> Unsafe
google.com?search=how%20to%20hack%20a%20wireless -> Safe
stackoverflow.com -> Safe
xxx.net -> Unsafe
------------------------------------------------------------
youtuber.com -> Unsafe
forbidden.com/hot-chick/123 -> Unsafe
xxx.net -> Unsafe