Question

<?php

$badWords = array("ban","bad","user","pass","stack","name","html");

$string = "Hello my name is user.";

$matches = array();
$matchFound = preg_match_all(
                "/\b(" . implode($badWords,"|") . ")\b/i", 
                $string, 
                $matches
              );

if ($matchFound) {
  $words = array_unique($matches[0]);
  foreach($words as $word) {
    echo "<li>" . $word . "</li>";
  }
  echo "</ul>";
}
?>

但当我将$ badWords改为希伯来语时：

$badWords = array("עזה","חמאס");

并将文本（$ string）更改为希伯来语：

$string = "חמאס רוצה להרוג אותנו ולא יצליח";

它不起作用。

为什么？

它的英文效果很好！

Answer 1

您只需告知正则表达式引擎您正在使用的模式包含utf-8个字符，您必须更改字符类\w和单词边界\b的含义处理utf-8字符（因为默认情况下\w仅包含ascii字母）。要做到这一点，你有两种方法：

使用u修饰符：

$matchFound = preg_match_all(
            "/\b(" . implode($badWords,"|") . ")\b/iu", 
            $string, 
            $matches
          );

或将(*UTF8)(*UCP)放在模式的最开头：

$matchFound = preg_match_all(
            "/(*UTF8)(*UCP)\b(" . implode($badWords,"|") . ")\b/i", 
            $string, 
            $matches
          );

(*UTF8)通知正则表达式引擎必须将模式字符串视为utf8字符串。

(*UCP)将\w更改为默认[a-zA-Z0-9_]至[\p{L}\p{N}_]

希伯来语中的字符串在preg_match_all中不起作用

1 个答案: