将单词停止为字符串

时间:2012-02-04 21:48:10

标签: php stop-words

我想在PHP中创建一个函数,当它发现字符串中有一些坏词时会返回true。

以下是一个例子:

function stopWords($string, $stopwords) {
if(the words in the stopwords variable are found in the string) {
return true;
}else{
return false;
}

请假设$stopwords变量是一个值数组,例如:

$stopwords = array('fuc', 'dic', 'pus');

我该怎么做?

由于

3 个答案:

答案 0 :(得分:1)

使用strpos功能。

// the function assumes the $stopwords to be an array of strings that each represent a
//  word that should not be in $string
function stopWords($string, $stopwords) 
{  
     // input parameters validation excluded for brevity..

     // take each of the words in the $stopwords array
     foreach($stopwords as $badWord)
     {
         // if the $badWord is found in the $string the strpos will return non-FALSE        
         if(strpos($string, $badWord) !== FALSE))
           return TRUE;
     }
     // if the function hasn't returned TRUE yet it must be that no bad words were found
     return FALSE;
 }

答案 1 :(得分:1)

使用regular expressions

  • \b匹配单词边界,使用它来匹配整个单词
  • 使用标记i执行不区分大小写的匹配

匹配每个单词:

function stopWords($string, $stopwords) {
    foreach ($stopwords as $stopword) {
        $pattern = '/\b' . $stopword . '\b/i';
        if (preg_match($pattern, $string)) {
            return true;
        }
    }
    return false;
}

$stopwords = array('fuc', 'dic', 'pus');

$bad = stopWords('confucius', $stopwords); // true
$bad = stopWords('what the Fuc?', $stopwords); // false

较短版本的灵感来自对此问题的回答:determine if a string contains one of a set of words in an array是使用implode创建一个大表达式:

function stopWords($string, $stopwords) {
    $pattern = '/\b(' . implode('|', $stopwords) . ')\b/i';
    return preg_match($pattern, $string) > 0;
}

答案 2 :(得分:0)

function stopWords($string, $stopwords) {
    $words=explode(' ', $string); //splits the string into words and stores it in an array
    foreach($stopwords as $stopword)//loops through the stop words array
    {
        if(in_array($stopword, $words)) {//if the current stop word exists 
            //in the words contained in $string then exit the function 
            //immediately and return true
            return true;
        }
    }
    //else if none of the stop words were in $string then return false
    return false;
}

我假设$stopwords是一个数组开头。它应该是不是。