我正在寻找一种方法,在Swift 4中测试一个Character是否是任意CharacterSet的成员。我有这个Scanner
类,将用于一些轻量级的解析。该类中的一个函数是跳过当前位置上属于某一组可能字符的任何字符。
class MyScanner {
let str: String
var idx: String.Index
init(_ string: String) {
str = string
idx = str.startIndex
}
var remains: String { return String(str[idx..<str.endIndex])}
func skip(charactersIn characters: CharacterSet) {
while idx < str.endIndex && characters.contains(str[idx])) {
idx = source.index(idx, offsetBy: 1)
}
}
}
let scanner = MyScanner("fizz buzz fizz")
scanner.skip(charactersIn: CharacterSet.alphanumerics)
scanner.skip(charactersIn: CharacterSet.whitespaces)
print("what remains: \"\(scanner.remains)\"")
我想实现skip(charactersIn:)
函数,以便上面的代码打印buzz fizz
。
characters.contains(str[idx]))
中while
的{{1}}棘手的部分 - .contains()
需要一个Unicode.Scalar
,而我却想要找出下一步。
我知道我可以将String
传递给skip
函数,但我想找到一种方法让它与CharacterSet
一起使用,因为所有方便的静态成员(alphanumerics
,whitespaces
等)。
如果CharacterSet
包含Character
,那么如何对其进行测试?
答案 0 :(得分:12)
不确定它是否是最有效的方法,但您可以创建一个新的CharSet并检查它们是否为子/超集(设置比较相当快)
File "mytkinter.py"
答案 1 :(得分:6)
我知道您想使用CharacterSet
而不是String
,但CharacterSet
不支持(至少)支持由多个{{1}组成的字符}。请参阅“家庭”字符()或Apple在WWDC 2017视频What's New in Swift的字符串讨论中演示的国际标志字符(例如“”或“”)。多重肤色表情符号也表现出这种行为(例如vs)。
因此,我会谨慎使用Unicode.Scalar
(这是一组“Unicode字符值用于搜索操作”)。或者,如果您想为方便起见提供此方法,请注意它对于由多个unicode标量表示的字符无效。
因此,您可能会提供一个扫描程序,它提供CharacterSet
方法的CharacterSet
和String
版本:
skip
因此,您的简单示例仍然有效:
class MyScanner {
let string: String
var index: String.Index
init(_ string: String) {
self.string = string
index = string.startIndex
}
var remains: String { return String(string[index...]) }
/// Skip characters in a string
///
/// This rendition is safe to use with strings that have characters
/// represented by more than one unicode scalar.
///
/// - Parameter skipString: A string with all of the characters to skip.
func skip(charactersIn skipString: String) {
while index < string.endIndex, skipString.contains(string[index]) {
index = string.index(index, offsetBy: 1)
}
}
/// Skip characters in character set
///
/// Note, character sets cannot (yet) include characters that are represented by
/// more than one unicode scalar (e.g. or or ). If you want to test
/// for these multi-unicode characters, you have to use the `String` rendition of
/// this method.
///
/// This will simply stop scanning if it encounters a multi-unicode character in
/// the string being scanned (because it knows the `CharacterSet` can only represent
/// single-unicode characters) and you want to avoid false positives (e.g., mistaking
/// the Jamaican flag, , for the Japanese flag, ).
///
/// - Parameter characterSet: The character set to check for membership.
func skip(charactersIn characterSet: CharacterSet) {
while index < string.endIndex,
string[index].unicodeScalars.count == 1,
let character = string[index].unicodeScalars.first,
characterSet.contains(character) {
index = string.index(index, offsetBy: 1)
}
}
}
但是如果要跳过的字符可能包含多个unicode标量,请使用let scanner = MyScanner("fizz buzz fizz")
scanner.skip(charactersIn: CharacterSet.alphanumerics)
scanner.skip(charactersIn: CharacterSet.whitespaces)
print(scanner.remains) // "buzz fizz"
再现:
String
正如Michael Waterfall在下面的评论中指出的那样,let family = "\u{200D}\u{200D}\u{200D}" //
let boy = ""
let charactersToSkip = family + boy
let string = boy + family + "foobar" // foobar
let scanner = MyScanner(string)
scanner.skip(charactersIn: charactersToSkip)
print(scanner.remains) // foobar
有一个错误,甚至没有正确处理32位CharacterSet
值,这意味着它甚至不能正确处理单个标量字符值超过Unicode.Scalar
(包括表情符号等)。但是,上面的0xffff
再现正确处理了这些问题。
答案 2 :(得分:2)
Swift 4.2
CharacterSet
扩展功能以检查其是否包含Character
:
extension CharacterSet {
func containsUnicodeScalars(of character: Character) -> Bool {
return character.unicodeScalars.allSatisfy(contains(_:))
}
}
用法示例:
CharacterSet.decimalDigits.containsUnicodeScalars(of: "3") // true
CharacterSet.decimalDigits.containsUnicodeScalars(of: "a") // false