按组件拆分字符串并将组件保留在原位

时间:2017-01-11 09:46:20

标签: swift

string.components(separatedBy: ...)不同,我想在结果数组中保留分隔符。代码更具说明性

let input = "foo&bar|hello"
let output = string.tokenize(splitMarks: ["&", "|"])
let desiredResult = ["foo", "&", "bar", "|", "hello"]

标准库中是否有任何功能可以执行此操作?如果不是,我该如何实现这样的功能?

2 个答案:

答案 0 :(得分:6)

为此,您需要遍历String并检查每个characters是否为令牌。您可以为此extension String。{/ p>

extension String {

    func stringTokens(splitMarks: Set<String>) -> [String] {

        var string = ""
        var desiredOutput = [String]()
        for ch in self.characters {
            if splitMarks.contains(String(ch)) {
                if !string.isEmpty {
                    desiredOutput.append(string)
                }
                desiredOutput.append(String(ch))
                string = ""
            }
            else {
                string += String(ch)
            }
        }
        if !string.isEmpty {
            desiredOutput.append(string)
        }
        return desiredOutput
    }
}

现在你可以这样调用这个函数。

let input = "foo&bar|hello"
print(input.stringTokens(splitMarks: ["&", "|"]))

<强>输出

["foo", "&", "bar", "|", "hello"]

答案 1 :(得分:3)

您可以在循环中使用rangeOfCharacter(from: CharacterSet, ...) 在字符串中找到下一个分割标记,然后 将前一部分和分隔符附加到数组:

extension String {

    func tokenize(splitMarks: String) -> [String] {

        let cs = CharacterSet(charactersIn: splitMarks)
        var result = [String]()
        var pos = startIndex
        while let range = rangeOfCharacter(from: cs, range: pos..<endIndex) {
            // Append string preceding the split mark:
            if range.lowerBound != pos {
                result.append(self[pos..<range.lowerBound])
            }
            // Append split mark:
            result.append(self[range])
            // Update position for next search:
            pos = range.upperBound
        }
        // Append string following the last split mark:
        if pos != endIndex {
            result.append(self[pos..<endIndex])
        }
        return result
    }
}

示例:

let input = "foo&bar|hello"
let output = input.tokenize(splitMarks: "&|")
print(output)
// ["foo", "&", "bar", "|", "hello"]