比较字符串中的非英文字符

时间:2020-06-15 15:33:16

标签: javascript node.js string-matching

我需要比较以下非英语字符串

MajsstärkelseUnicode-Majsstärkelse

MajsstärkelseUnicode-Majsstärkelse

如果我这样比较

if('Majsstärkelse' === 'Majsstärkelse')

某些字符无法进行此比较。所以我尝试了

const collator = new Intl.Collator('de')
const order = collator.compare('Ü', 'ß')
console.log(order)

但是仍然没有成功的结果。我该如何实现

1 个答案:

答案 0 :(得分:1)

您可以使用String.protoype.normalize来规范规范化的等效字符串。

a='Majsst\u{00E4}rkelse'
b='Majssta\u{0308}rkelse'
console.log(a,b)
console.log(a === b)
console.log(a.normalize('NFC')===b.normalize('NFC'))

注意:您拥有的字符串已转义。上面是比较未转义的字符串。
首先要从Unicode HTML实体进行解码的代码:

const decodeUEntities = u=>u.replace(/&#(x[\dA-F]+|\d+);/g,
  (_,u)=>String.fromCodePoint(u[0]==='x'?parseInt(u.substr(1),16):+u))

str1 = decodeUEntities("Majsstärkelse")
str2 = decodeUEntities("Majsstärkelse")

// decode unicode HTML entities, if you want named HTML entities too, find a list of them and add them to the replacement code, for simplicty I will be leaving that out
console.log(str1, str2, str1===str2)

console.log(str1.normalize('NFC'),str2.normalize('NFC'),
            str1.normalize('NFC')===str2.normalize('NFC'))