测试一个CharacterSet是否在Swift 4中包含一个字符的最佳方法是什么?

时间:2017-08-24 23:33:27

标签: swift swift4

我正在寻找一种方法,在Swift 4中测试一个Character是否是任意CharacterSet的成员。我有这个Scanner类,将用于一些轻量级的解析。该类中的一个函数是跳过当前位置上属于某一组可能字符的任何字符。

class MyScanner {
  let str: String
  var idx: String.Index
  init(_ string: String) {
    str = string
    idx = str.startIndex
  }
  var remains: String { return String(str[idx..<str.endIndex])}

  func skip(charactersIn characters: CharacterSet) {
    while idx < str.endIndex && characters.contains(str[idx])) {
      idx = source.index(idx, offsetBy: 1)
    }
  }
}

let scanner = MyScanner("fizz   buzz fizz")
scanner.skip(charactersIn: CharacterSet.alphanumerics)
scanner.skip(charactersIn: CharacterSet.whitespaces)
print("what remains: \"\(scanner.remains)\"")

我想实现skip(charactersIn:)函数,以便上面的代码打印buzz fizz

characters.contains(str[idx]))while的{​​{1}}棘手的部分 - .contains()需要一个Unicode.Scalar,而我却想要找出下一步。

我知道我可以将String传递给skip函数,但我想找到一种方法让它与CharacterSet一起使用,因为所有方便的静态成员(alphanumericswhitespaces等)。

如果CharacterSet包含Character,那么如何对其进行测试?

3 个答案:

答案 0 :(得分:12)

不确定它是否是最有效的方法,但您可以创建一个新的CharSet并检查它们是否为子/超集(设置比较相当快)

File "mytkinter.py"

答案 1 :(得分:6)

我知道您想使用CharacterSet而不是String,但CharacterSet不支持(至少)支持由多个{{1}组成的字符}。请参阅“家庭”字符()或Apple在WWDC 2017视频What's New in Swift的字符串讨论中演示的国际标志字符(例如“”或“”)。多重肤色表情符号也表现出这种行为(例如vs)。

因此,我会谨慎使用Unicode.Scalar(这是一组“Unicode字符值用于搜索操作”)。或者,如果您想为方便起见提供此方法,请注意它对于由多个unicode标量表示的字符无效。

因此,您可能会提供一个扫描程序,它提供CharacterSet方法的CharacterSetString版本:

skip

因此,您的简单示例仍然有效:

class MyScanner {
    let string: String
    var index: String.Index

    init(_ string: String) {
        self.string = string
        index = string.startIndex
    }

    var remains: String { return String(string[index...]) }

    /// Skip characters in a string
    ///
    /// This rendition is safe to use with strings that have characters
    /// represented by more than one unicode scalar.
    ///
    /// - Parameter skipString: A string with all of the characters to skip.

    func skip(charactersIn skipString: String) {
        while index < string.endIndex, skipString.contains(string[index]) {
            index = string.index(index, offsetBy: 1)
        }
    }

    /// Skip characters in character set
    ///
    /// Note, character sets cannot (yet) include characters that are represented by
    /// more than one unicode scalar (e.g. ‍‍‍ or  or ). If you want to test
    /// for these multi-unicode characters, you have to use the `String` rendition of
    /// this method.
    ///
    /// This will simply stop scanning if it encounters a multi-unicode character in
    /// the string being scanned (because it knows the `CharacterSet` can only represent
    /// single-unicode characters) and you want to avoid false positives (e.g., mistaking
    /// the Jamaican flag, , for the Japanese flag, ).
    ///
    /// - Parameter characterSet: The character set to check for membership.

    func skip(charactersIn characterSet: CharacterSet) {
        while index < string.endIndex,
            string[index].unicodeScalars.count == 1,
            let character = string[index].unicodeScalars.first,
            characterSet.contains(character) {
                index = string.index(index, offsetBy: 1)
        }
    }

}

但是如果要跳过的字符可能包含多个unicode标量,请使用let scanner = MyScanner("fizz buzz fizz") scanner.skip(charactersIn: CharacterSet.alphanumerics) scanner.skip(charactersIn: CharacterSet.whitespaces) print(scanner.remains) // "buzz fizz" 再现:

String

正如Michael Waterfall在下面的评论中指出的那样,let family = "\u{200D}\u{200D}\u{200D}" // ‍‍‍ let boy = "" let charactersToSkip = family + boy let string = boy + family + "foobar" // ‍‍‍foobar let scanner = MyScanner(string) scanner.skip(charactersIn: charactersToSkip) print(scanner.remains) // foobar 有一个错误,甚至没有正确处理32位CharacterSet值,这意味着它甚至不能正确处理单个标量字符值超过Unicode.Scalar(包括表情符号等)。但是,上面的0xffff再现正确处理了这些问题。

答案 2 :(得分:2)

Swift 4.2 CharacterSet扩展功能以检查其是否包含Character

extension CharacterSet {
    func containsUnicodeScalars(of character: Character) -> Bool {
        return character.unicodeScalars.allSatisfy(contains(_:))
    }
}

用法示例:

CharacterSet.decimalDigits.containsUnicodeScalars(of: "3") // true
CharacterSet.decimalDigits.containsUnicodeScalars(of: "a") // false