如何使用PHP动态过滤网站内容

时间:2010-08-29 15:37:48

标签: php

我目前正在寻找动态过滤网站内容的解决方案。 “动态”是指我会在第一页上的整个单词中计算坏词的百分比,即shitf**k等。如果百分比不超过30%,则说允许网站。如何让它搜索第一页上的每个单词并将它们与坏单词列表匹配,然后除以单词的总数,这样我就可以得到百分比?理由不是制作内容过滤器,而是仅阻止网站,即使页面中的单个单词与坏单词列表匹配也是如此。虽然我有这个,但它是静态的。

$filename =   "filters.txt";

$fp = @fopen($filename, 'r');

if ($fp) {

$array = explode("\n", fread($fp, filesize($filename)));

foreach($array as $key => $val){

list($before,$after) = split("~",$val);

$input = preg_replace($before,$after,$input);

}
}

* filter.txt 包含错误字词列表


Thanx Erisco!

试过这个,但它似乎不适合你。

function get_content($url)
{
   $ch = curl_init();

   curl_setopt ($ch, CURLOPT_URL, $url);
   curl_setopt ($ch, CURLOPT_HEADER, 0);

   ob_start();

   curl_exec ($ch);
   curl_close ($ch);
   $string = ob_get_contents();

   ob_end_clean();

   return $string;    

}


/* $toLoad is from Browse.php */

$sourceOfWebpage = get_content($toLoad);
$textOfWebpage = strip_tags($sourceOfWebpage);

/* array: Obtained by your filter.txt file */
// Open the filters file and filter all of the results.

$filename =   "filters.txt";
$badWords = @fopen($filename, 'r');

if ($badWords) {
  $array = explode("\n", fread($fp, filesize($filename)));

  foreach($array as $key => $val){
    list($before,$after) = split("~",$val);
    $input = preg_replace($before,$after,$input);
  }
}

/* float: Some decimal value */

$allowedBadWordsPercent = 0.30;
$numberOfWords = str_word_count($textOfWebpage);
$numberOfBadWords = 0;
str_ireplace($badWords, '', $sourceOfWebpage, $numberOfBadWords);

if ($numberOfBadWords != 0) {
    $badWordsPercent = $numberOfWords / $numberOfBadWords;
} else {
    $badWordsPercent = 0;
}

if ($badWordsPercent > $allowedBadWordsPercent) {
    echo 'This is a naughty webpage';
}

1 个答案:

答案 0 :(得分:1)

这是我要做的事情的粗略概念。你可能会争辩说,纯粹使用str_ireplace()来计算是不正常的。我不确定是否有更多的方向功能而不会破坏正则表达式。

/* string: Obtained by CURL or similar */
$sourceOfWebpage;

$textOfWebpage = strip_tags($sourceOfWebpage);

/* array: Obtained by your filter.txt file */
$badWords;

/* float: Some decimal value */
$allowedBadWordsPercent = 0.30;

$numberOfWords = str_word_count($textOfWebpage);
$numberOfBadWords = 0;

str_ireplace($badWords, '', $sourceOfWebpage, $numberOfBadWords);

if ($numberOfBadWords != 0) {
    $badWordsPercent = $numberOfWords / $numberOfBadWords;
} else {
    $badWordsPercent = 0;
}

if ($badWordsPercent > $allowedBadWordsPercent) {
    echo 'This is a naughty webpage';
}