我正在尝试为那些选择严格版本的用户创建严格的聊天过滤器。我想阻止所有网址,除了一些白名单(youtube,prntscr,facebook等),以防止人们发送色情内容,IP抓取,病毒下载等。
我知道我可以使用一些额外的代码来完成这项工作,但是有没有办法使用正则表达式来实现这一点?我想检查字符串是否包含URL,但URL不是白名单(例如youtube.com)。
我希望在Python和PHP中实现它,但我只需要正则表达式,因为我知道如何在两种语言中使用正则表达式。
由于
编辑:要清楚 - 这是针对聊天系统的严格模式。用户发送的消息可以是“Hello”到“http://unsafelink.com去那里!!”的任何内容。
答案 0 :(得分:0)
检查下面的代码段
$message = array(
"Dude I saw her on youtube",
"I just opened an account on youtuber.com",
"I'm watching an amazing prank, check this out youtube.com/gfsddfh784",
"Dude, isn't this girl forbidden.com/hot-chick/123 Mery from our school?",
"Take a look google.com?search=how%20to%20hack%20a%20wireless",
"Ask someone on stackoverflow.com :p",
"I found this great snippet on stackoverflow!",
"He's all day on xxx.net"
);
$url = '/(((https?:\/\/)?www)?\.?[a-z0-9]+\.[a-z0-9]+[a-z0-9\-\/?&#%=]+)/';
$whitelist = "/\b(youtube|stackoverflow|google|twitter|facebook|prntscr)\b/";
// check messages like this
foreach ($message as &$line){
if(preg_match($url, $line, $match)){
echo $match[0] , preg_match($whitelist, $match[0]) ? " -> Safe" : " -> Unsafe" , '<br />';
}
}
echo "<hr />";
// or like this
foreach ($message as &$line){
if(preg_match($url, $line, $match) && !preg_match($whitelist, $match[0])){
echo $match[0] . " -> Unsafe" . '<br />';
}
}
输出:
youtuber.com -> Unsafe
youtube.com/gfsddfh784 -> Safe
forbidden.com/hot-chick/123 -> Unsafe
google.com?search=how%20to%20hack%20a%20wireless -> Safe
stackoverflow.com -> Safe
xxx.net -> Unsafe
------------------------------------------------------------
youtuber.com -> Unsafe
forbidden.com/hot-chick/123 -> Unsafe
xxx.net -> Unsafe