Question

我在几个站点上使用下面的代码作为用户输入的基本过滤器，然后将其保存为.txt文件以供稍后使用。

我在这个问题上的问题是，如果它包含$ stopwords中的单词，我希望它删除该行，但不一定是完全匹配。

为了这个例子的目的，我更改了下面的$ stopwords，因为我不希望它被当作垃圾邮件。

但是作为一个例子，如果我希望它从数组中删除行，如果其中一个停顿词就在这条线上。

因此，如果一行包含bad或badgirl或notbad，那么我希望删除该行。

目前它必须是完全匹配，因为我正在使用strcmp但是这样做的更好/正确的方法是什么？

    $stopwords = "bad|badword|bad";
$stopwords = explode('|', $stopwords);
for ($i=0; $i<count($lines); $i++)
{
    $lines[$i] = substr($lines[$i], 0, -1);
    $lines[$i] = preg_replace('/(\s)+/', ' ', $lines[$i]);
    $lines[$i] = strtolower($lines[$i]);
    foreach($stopwords as $stopword)
    {
        if (0 == strcmp($lines[$i], $stopword))
        {
            unset($lines[$i]);
            //echo 'deleted'. $lines[$i];
        }
    }
    $lines[$i] = trim($lines[$i]);
}

感谢您的帮助！

Answer 1

您可以使用strpos，如下所示：

if(strpos($lines[$i], $stopword) !== FALSE)
{
    unset($lines[$i]);
}

请确保使用！==而不是！=，因为strpos也可以返回0，等于false。

Answer 2

使用正则表达式。它们匹配单词的部分（如果你愿意，甚至可以匹配整个单词/句子）。

在此处阅读：preg_match()。

示例：

// The "i" after the pattern delimiter indicates a case-insensitive search
if (preg_match("/php/i", "PHP is the web scripting language of choice.")) {
    echo "A match was found.";
} else {
    echo "A match was not found.";
}

Answer 3

因为你已经拥有一系列线条。用于匹配而不是字符串：

$ stopwords =＆＃34; bad | badword | bad＆＃34 ;; //转换为正则表达式

正则表达式，使用preg_match()
进一步将行分为单词，排成行explode()
将两者结合起来以获得更高的准确性（即bbadword vs bodword）。在explode()和生成的键值数组之后，在迭代中应用正则表达式以找到您可以定义的更接近的匹配，即95％限制然后它是匹配，取决于所使用的数据大小和算法，这可以运行缓慢

如何在数组中找不到完全匹配？

3 个答案: