Question

所以我在一个字符串中检查了一系列亵渎。

E.g。

$string = 'naughty string';
$words = [
    'naughty',
    'example',
    'words'
];
$pattern = '/('.join($words, '|').')/i';
preg_match_all($pattern, $string, $matches);
$matched = implode(', ', $matches[0]);

但我也想检查以空格分割的亵渎：

E.g。

n a u g h t y

是的我可以通过将其添加到数组来实现：

$words = [
    'naughty',
    'n a u g h t y',
    'example',
    'e x a m p l e',
    'words',
    'w o r d s'
];

但我有一大堆＆＃34;坏＆＃34;单词，并想知道是否有任何简单的方法吗？

------编辑------

所以这并不意味着超级准确。对于我的应用程序，每个空格都是一个新行。所以这样的字符串： n a u g h t y string 会产生这样的结果：

名词

一

û

克

ħ

吨

ý

的字符串

Answer 1

要回答问题，请创建一个类似b\s*a\s*d的模式，而不仅仅是bad：

$string = 'some bad and b a d and more ugly and very u g l y words';

$words = [
    'bad',
    'ugly'
];

$pattern = '/\b(' . join(
    array_map(function($w) {
        return join(str_split($w), '\s*');
    }, $words), '|') .'\b)/i';

print preg_replace($pattern, '***', $string); 
// some *** and *** and more *** and very *** words

更一般地说，您无法可靠地删除亵渎，尤其是在unicode世界中。您无法过滤掉ƒⓤçκ等内容。

Answer 2

使用\s?将单词编码到数组中以匹配可选空格，如下所示：

$words = [
    'n\s?a\s?u\s?g\s?h\s?t\s?y',
    'e\s?x\s?a\s?m\s?p\s?l\s?e',
    'w\s?o\s?r\s?d\s?s',
];

或者您可以使用\s*来匹配任意数量的空格。

如果您不熟悉正则表达式的细微差别，我建议您查看https://regex101.com/

正则表达式检查单词和单词与空格分隔字母

2 个答案: