使用NSRegularExpression匹配的奇怪字符串范围行为

时间:2016-10-25 21:11:31

标签: regex swift nsregularexpression

我正在尝试解析原始HTTP响应,并且在尝试将NSRange转换为Range时我的范围不正确。以下是游乐场的相关代码:

public extension NSRange {
    public func toStringRange(_ str: String) -> Range<String.Index>? {
        guard str.characters.count >= length - location  && location < str.characters.count else { return nil }
        let fromIdx = str.characters.index(str.startIndex, offsetBy: self.location)
        print("from: \(self.location) = \(fromIdx)")
        let toIdx = str.characters.index(fromIdx, offsetBy: self.length)
        return fromIdx..<toIdx
    }
}

let responseString = "HTTP/1.0 200 OK\r\nContent-Length: 193\r\nContent-Type: application/json\r\n"
let responseRange = NSRange(location: 0, length: responseString.characters.count)
let responseRegex = try! NSRegularExpression(pattern: "^(HTTP/1.\\d) (\\d+) (.*?\r\n)(.*)", options: [.anchorsMatchLines])
guard let matchResult = responseRegex.firstMatch(in: responseString, options: [], range: responseRange),
    matchResult.numberOfRanges == 5,
    let versionRange = matchResult.rangeAt(1).toStringRange(responseString),
    let statusRange = matchResult.rangeAt(2).toStringRange(responseString),
    let headersRange = matchResult.rangeAt(4).toStringRange(responseString)
    else { fatalError() }

toStringRange()中print的输出是

from: 0 = Index(_base: Swift.String.UnicodeScalarView.Index(_position: 0), _countUTF16: 1)
from: 9 = Index(_base: Swift.String.UnicodeScalarView.Index(_position: 9), _countUTF16: 1)
from: 17 = Index(_base: Swift.String.UnicodeScalarView.Index(_position: 18), _countUTF16: 1)

为什么第3个toStringRange()调用返回的字符串范围是从18开始而不是17?

1 个答案:

答案 0 :(得分:1)

您从Range<String.Index>NSRange的转换方法没有 适用于扩展的字形簇和外部字符 &#34;基本的多语言平面&#34; (表情符号,旗帜等)。

unichar计算UTF-16代码点(对应NSStringRange<String.Index>)中的表示。 Characters计算斯威夫特 "\r\n"代表扩展的字形集群。

在具体案例中,Character计为两个UTF-16代码点,但是 作为单个let responseString = "OK\r\nContent-Length" let nsRange = (responseString as NSString).range(of: "Content") print(nsRange.location, nsRange.length) // 4 7 if let sRange1 = nsRange.toStringRange(responseString) { print(responseString.substring(with: sRange1)) // "ontent-" } ,会导致不必要的转移&#34;。

这是一个简化的例子:

extension String {
    func range(from nsRange: NSRange) -> Range<String.Index>? {
        guard
            let from16 = utf16.index(utf16.startIndex, offsetBy: nsRange.location, limitedBy: utf16.endIndex),
            let to16 = utf16.index(from16, offsetBy: nsRange.length, limitedBy: utf16.endIndex),
            let from = String.Index(from16, within: self),
            let to = String.Index(to16, within: self)
            else { return nil }
        return from ..< to
    }
}

使用方法

if let sRange2 = responseString.range(from: nsRange) {
    print(responseString.substring(with: sRange2)) // "Content"
}
来自NSRange to Range<String.Index>

您将获得预期的结果:

 (?:(\/\*[\w\s\']*)|(\/\/[\w\s\']*))if