奇怪的String.unicodeScalars和CharacterSet行为

时间:2017-02-15 14:44:18

标签: swift

我试图使用Swift 3 CharacterSet来过滤String中的字符,但我很早就陷入困境。 CharacterSet有一个名为contains

的方法
  
    

func包含(_ member:UnicodeScalar) - >布尔
    测试CharacterSet中特定UnicodeScalar的成员资格。

  

但测试这并不能产生预期的行为。

let characterSet = CharacterSet.capitalizedLetters

let capitalAString = "A"

if let capitalA = capitalAString.unicodeScalars.first {
    print("Capital A is \(characterSet.contains(capitalA) ? "" : "not ")in the group of capital letters")
} else {
    print("Couldn't get the first element of capitalAString's unicode scalars")
}

我得到了Capital A is not in the group of capital letters但我却期待相反的事情。

非常感谢。

1 个答案:

答案 0 :(得分:7)

CharacterSet.capitalizedLetters 返回一个字符集,其中包含Unicode General Category Lt aka“Letter,titlecase”中的字符。那是 “包含大写字母后跟小写字母的连字符(例如,Dž,Lj,Nj和Dz)”(比较Wikipedia: Unicode character propertyUnicode® Standard Annex #44 – Table 12. General_Category Values)。

您可以在此处找到一个列表:Unicode Characters in the 'Letter, Titlecase' Category

您也可以使用来自的代码 NSArray from NSCharacterset转储角色的内容 设置:

extension CharacterSet {
    func allCharacters() -> [Character] {
        var result: [Character] = []
        for plane: UInt8 in 0...16 where self.hasMember(inPlane: plane) {
            for unicode in UInt32(plane) << 16 ..< UInt32(plane + 1) << 16 {
                if let uniChar = UnicodeScalar(unicode), self.contains(uniChar) {
                    result.append(Character(uniChar))
                }
            }
        }
        return result
    }
}

let characterSet = CharacterSet.capitalizedLetters
print(characterSet.allCharacters())

// ["Dž", "Lj", "Nj", "Dz", "ᾈ", "ᾉ", "ᾊ", "ᾋ", "ᾌ", "ᾍ", "ᾎ", "ᾏ", "ᾘ", "ᾙ", "ᾚ", "ᾛ", "ᾜ", "ᾝ", "ᾞ", "ᾟ", "ᾨ", "ᾩ", "ᾪ", "ᾫ", "ᾬ", "ᾭ", "ᾮ", "ᾯ", "ᾼ", "ῌ", "ῼ"]

你可能想要的是CharacterSet.uppercaseLetters

  

返回包含Unicode General Category Lu和Lt。

中字符的字符集