Question

任何人都可以帮我解决这个正则表达式，因为我不确定如何实现它。

我需要一个正则表达式来删除字符串中的所有单词，其中包含至少一个不是UTF-8字母或数字的字符，或者单词中间的标点符号（但不在结尾处）。

示例：

This is Â®Aix string
A bad str?ng is here

第一个示例包含®，它不是字母，数字或标点符号。第二个例子中间包含标点符号。

我需要删除这些坏词，但保持字符串的其余部分不变。例如。 This is string，A bad is here。

请注意，A bad string? is here不会包含任何不良字词，因为标点符号位于单词的末尾。

提前感谢您的帮助。

Answer 1

这个怎么样：

$result = preg_replace(
    '/\b            # Start of word
    [\p{L}\p{N}]+   # One or more Unicode letters
    [^\s\p{L}\p{N}] # One non-letter (and non-whitespace), followed by
    [^\s\p{P}]+     # at least one non-whitespace, non-punctuation character
    \b              # End of word
    \s*             # optional following whitespace
    /xu', 
    '', $subject);

PHP RegEx从字符串中删除包含非字母/数字的单词

1 个答案: