PHP:检查某些单词的字符串

时间:2010-08-15 14:28:42

标签: php

如何检查从表单或查询字符串提交的数据中是否包含某些单词?

我正在尝试在[Post]数据和查询字符串数据中查找包含admin,drop,create等的单词,以便我接受或拒绝它。

我正在从ASP转换为PHP。我过去常常在ASP中使用数组(将所有非法单词保存在字符串中并使用ubound来检查整个字符串中的单词),但在PHP中有更好(有效)的方法吗?

例如:像这样的字符串会被拒绝:“管理员放弃了一个等等等等,因为它有管理员并且放弃了。”

我打算在创建帐户时使用它来检查用户名以及其他内容。

由于

7 个答案:

答案 0 :(得分:4)

您可以使用stripos()

int stripos ( string $haystack , string $needle [, int $offset = 0 ] )

您可以使用以下功能:

function checkBadWords($str, $badwords) {
    foreach ($badwords as $word) {
        if (stripos(" $str ", " $word ") !== false) {
            return false;
        }
    }
    return true;
}

使用它:

if (!checkBadWords('something admin', array('admin')) {
    // ...
}

答案 1 :(得分:3)

strpos()将允许您在更大的字符串中搜索子字符串。它很快,效果很好。如果找不到字符串,则返回false,如果找到字符串,则返回一个数字(可能为零,因此需要使用===进行检查)。

stripos()是一个不区分大小写的版本。

  

我正在尝试在[Post]数据和查询字符串数据中查找包含admin,drop,create等的单词,以便我接受或拒绝它。

我怀疑您正在尝试过滤字符串,因此它适合包含在类似数据库查询之类的内容中。如果是这种情况,这可能不是一个很好的方法,你需要实际上需要使用mysql_real_escape_string()或等效的转义字符串。

答案 2 :(得分:2)

$badwords = array("admin", "drop",);
foreach (str_word_count($string, 1) as $word) {
    foreach ($badwords as $bw) {
        if (strpos($word, $bw) === 0) {
            //contains word $word that starts with bad word $bw
        }
    }
}

对于JGB146,这是与正则表达式的性能比较:

<?php
function has_bad_words($badwords, $string) {

    foreach (str_word_count($string, 1) as $word) {
        foreach ($badwords as $bw) {
            if (stripos($word, $bw) === 0) {
                return true;
            }
        }
        return false;
    }

}

function has_bad_words2($badwords, $string) {

    $regex = array_map(function ($w) {
        return "(?:\\b". preg_quote($w, "/") . ")"; }, $badwords);
    $regex = "/" . implode("|", $regex) . "/";
    return preg_match($regex, $string) != 0;

}

$badwords = array("abc", "def", "ghi", "jkl", "mnop");
$string = "The quick brown fox jumps over the lazy dog";

$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
 has_bad_words($badwords, $string);
}

echo "elapsed: ". (microtime(true) - $start);

$start = microtime(true);
for ($i = 0; $i < 10000; $i++) {
 has_bad_words2($badwords, $string);
}

echo "elapsed: ". (microtime(true) - $start);

示例输出:

elapsed: 0.076514959335327
elapsed: 0.29999899864197

所以正则表达式要慢得多。

答案 3 :(得分:0)

你可以肯定做一个循环,正如其他人所说的那样。但我认为你可以通过直接使用数组的操作更接近你正在寻找的行为,而且它允许通过单个if语句执行。

最初,我认为你可以通过一个简单的preg_match()调用(因此是downvote)来做到这一点,但是preg_match不支持数组。相反,您可以通过preg_replace进行替换,以便将所有被拒绝的字符串替换为空,然后检查字符串是否已更改。这很简单,并且避免了对每个被拒绝的字符串进行循环迭代。

$rejectedStrs = array("/admin/", "/drop/", "/create/");
if($input == preg_replace($rejectedStrs, "", $input)) {
   //do stuff
} else { 
   //reject
}

另请注意,您可以使用正则表达式模式上的i标记提供不区分大小写的搜索,将模式数组更改为$rejectedStrs = array("/admin/i", "/drop/i", "/create/i");

效率

与接受的嵌套循环方法相比,这种方式的效率存在争议。我运行了一些测试,发现preg_replace方法的执行速度是嵌套循环的两倍。以下是这些测试的代码和输出:

$input = "You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement. You can certainly do a loop, as others have suggested. But I think you can get closer to the behavior you're looking for with an operation that directly uses arrays, plus it allows execution via a single if statement.";

$input = "Short string with no matches";
$input2 = "Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. Longer string with a lot more words but still no matches. ";
$input3 = "Short string which loop will match quickly";
$input4 = "Longer string that will eventually be matches but first has a lot of words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words, followed by more words and then more words and then finally the word create near the end";

$start1 = microtime(true);
$rejectedStrs = array("/loop/", "/operation/", "/create/");
$p_matches = 0;
for ($i = 0; $i < 10000; $i++) {
    if (preg_check($rejectedStrs, $input)) $p_matches++;
    if (preg_check($rejectedStrs, $input2)) $p_matches++;
    if (preg_check($rejectedStrs, $input3)) $p_matches++;
    if (preg_check($rejectedStrs, $input4)) $p_matches++;
}

$start2 = microtime(true);
$rejectedStrs = array("loop", "operation", "create");
$l_matches = 0;
for ($i = 0; $i < 10000; $i++) {
    if (loop_check($rejectedStrs, $input)) $l_matches++;
    if (loop_check($rejectedStrs, $input2)) $l_matches++;
    if (loop_check($rejectedStrs, $input3)) $l_matches++;
    if (loop_check($rejectedStrs, $input4)) $l_matches++;
}

$end = microtime(true);
echo "preg_match: ".$start1." ".$start2."= ".($start2-$start1)."\nloop_match: ".$start2." ".$end."=".($end-$start2);

function preg_check($rejectedStrs, $input) {
    if($input == preg_replace($rejectedStrs, "", $input)) 
        return true;
    return false;
}

function loop_check($badwords, $string) {

    foreach (str_word_count($string, 1) as $word) {
        foreach ($badwords as $bw) {
            if (stripos($word, $bw) === 0) {
                return true;
            }
        }
        return false;
    }

}

输出:

  

preg_match:1281908071.4032 1281908071.9947 = 0.5915060043335

     

loop_match:1281908071.9947 1281908073.006 = 1.0112948417664

答案 4 :(得分:0)

您可以使用这样的正则表达式:

preg_match("~(admin)|(drop)|(another token)|(yet another)~",$subject);

从数组

构建模式字符串
$pattern = implode(")|(", $banned_words);
$pattern = "~(".$pattern.")~";

答案 5 :(得分:0)

function check($string, $array) {
    foreach($array as $item) {
        if( preg_match("/($item)/", $string)  )
            return true;
    }
    return false;
}

答案 6 :(得分:0)

这实际上非常简单,请使用substr_count。

你的例子就是:

if (substr_count($variable_to_search, "drop"))
{
    echo "error";
}

为了使事情更简单,将你的关键字(即“drop”,“create”,“alter”)放在一个数组中,并使用foreach来检查它们。这样你就可以覆盖所有的话。一个例子

foreach ($keywordArray as $keyword)
{
    if (substr_count($variable_to_search, $keyword))
    { 
        echo "error"; //or do whatever you want to do went you find something you don't like
    }
}