Question

有没有办法在Unicode文本上应用replace方法（这里有阿拉伯语）？在下面的示例中，虽然替换整个单词在英文文本上很好地工作，但它无法检测到并因此替换阿拉伯单词。我添加了u作为标志来启用unicode解析，但这没有帮助。在下面的阿拉伯语示例中，应该替换单词النجوم，但不能替换والنجوم，但这不会发生。

<!DOCTYPE html>
<html>
<body>
<p>Click to replace...</p>
<button onclick="myFunction()">replace</button>
<p id="demo"></p>
<script>
function myFunction() {
  var str = "الشمس والقمر والنجوم، ثم النجوم والنهار";
  var rep = 'النجوم';
  var repWith = 'الليل';

  //var str = "the sun and the stars, then the starsz and the day";
  //var rep = 'stars';
  //var repWith = 'night';

  var result = str.replace(new RegExp("\\b"+rep+"\\b", "ug"), repWith);
  document.getElementById("demo").innerHTML = result;
}
</script>
</body>
</html>

而且，无论您提供什么解决方案，请按照上面的代码（上面的变量rep）继续使用变量，因为这些替换单词是通过函数调用传递的。

更新：要尝试上述代码，请将here中的代码替换为上述代码。

Answer 1

\bword\b模式可以表示为(^|[A-Za-z0-9_])word(?![A-Za-z0-9_])模式，当您需要替换匹配时，您需要在替换模式之前添加$1。

由于您需要使用Unicode，因此使用支持任何基本Unicode字母的“简写”\pL表示法的XRegExp库是有意义的。您可以使用此A-Za-z替换上述模式中的\pL：

var str = "الشمس والقمر والنجوم، ثم النجوم والنهار";
var rep = 'النجوم';
var repWith = 'الليل';

var regex = new XRegExp('(^|[^\\pL0-9_])' + rep + '(?![\\pL0-9_])');
var result = XRegExp.replace(str, regex, '$1' + repWith, 'all');
console.log(result);

<script src="https://cdnjs.cloudflare.com/ajax/libs/xregexp/3.1.1/xregexp-all.min.js"></script>

@mohsenmadi更新：要集成到Angular应用程序中，请按照下列步骤操作：

发出npm install xregexp以将库添加到package.json
在组件内添加import { replace, build } from 'xregexp/xregexp-all.js';
使用let regex = build('(^|[^\\pL0-9_])' + rep + '(?![\\pL0-9_])');
替换为：let result = replace(str, regex, '$1' + repWith, 'all');

Answer 2

如果你改变了对空白边界的看法，这就是正则表达式。

var Rx = new RegExp(
   "(^|[\\u0009-\\u000D\\u0020\\u0085\\u00A0\\u1680\\u2000-\\u200A\\u2028-\\u2029\\u202F\\u205F\\u3000])"
   + text +
   "(?![^\\u0009-\\u000D\\u0020\\u0085\\u00A0\\u1680\\u2000-\\u200A\\u2028-\\u2029\\u202F\\u205F\\u3000])"
   ,"ug");

var result = str.replace( Rx, '$1' + repWith );

正则表达式解释

 (                             # (1 start), simulated whitespace boundary
      ^                             # BOL
   |                              # or whitespace
      [\u0009-\u000D\u0020\u0085\u00A0\u1680\u2000-\u200A\u2028-\u2029\u202F\u205F\u3000] 
 )                             # (1 end)

 text                          # To find

 (?!                           # Whitespace boundary
      [^\u0009-\u000D\u0020\u0085\u00A0\u1680\u2000-\u200A\u2028-\u2029\u202F\u205F\u3000] 
 )

在一个可以使用lookbehind断言的引擎中，一个空白边界
通常像(?<!\S)text(?!\S)一样完成。

在unicode问题上使用正则表达式替换/替换所有

2 个答案: