所以这是字符串s
:
"Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
我希望将它们分隔为数组:
["Hi", "How are you", "I'm fine", "It is 6 p.m", "Thank you", "That's it"]
这意味着分隔符应为". "
+ "? "
+ "! "
我试过了:
let charSet = NSCharacterSet(charactersInString: ".?!")
let array = s.componentsSeparatedByCharactersInSet(charSet)
但它也将p.m.
分成两个元素。结果:
["Hi", " How are you", " I'm fine", " It is 6 p", "m", " Thank you", " That's it"]
我也试过
let array = s.componentsSeparatedByString(". ")
它适用于分隔". "
,但如果我还要将"? "
,"! "
分开,则会变得混乱。
所以我能做到这一点吗?谢谢!
答案 0 :(得分:5)
提供了一种允许枚举字符串的方法。您可以通过单词或句子或其他选项来完成此操作。不需要正则表达式。
let s = "Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
var sentences = [String]()
s.enumerateSubstringsInRange(s.startIndex..<s.endIndex, options: .BySentences) {
substring, substringRange, enclosingRange, stop in
sentences.append(substring!)
}
print(sentences)
结果是:
[&#34;嗨! &#34;,&#34;你好吗? &#34;,&#34;我很好。 &#34;,&#34;现在是下午6点。 &#34;,&#34;谢谢! &#34;,&#34;那就是它。&#34;]
答案 1 :(得分:3)
rmaddy的回答是正确的(+1)。 Swift 3的实现是:
production:
EXAMPLE_VARIABLE: '[example_first_value, example_second_value]'
你也可以使用正则表达式NSRegularExpression
,虽然它比rmaddy的var sentences = [String]()
string.enumerateSubstrings(in: string.startIndex ..< string.endIndex, options: .bySentences) { substring, substringRange, enclosingRange, stop in
sentences.append(substring!)
}
解决方案更加毛茸茸。在Swift 3中:
.bySentences
或Swift 2:
var sentences = [String]()
let regex = try! NSRegularExpression(pattern: "(^|\\s+)(\\w.*?[.!?]+)(?=(\\s+|$))")
regex.enumerateMatches(in: string, range: NSMakeRange(0, string.characters.count)) { match, flags, stop in
sentences.append((string as NSString).substring(with: match!.rangeAt(2)))
}
let regex = try! NSRegularExpression(pattern: "(^|\\s+)(\\w.*?[.!?]+)(?=(\\s+|$))", options: [])
var sentences = [String]()
regex.enumerateMatchesInString(string, options: [], range: NSMakeRange(0, string.characters.count)) { match, flags, stop in
sentences.append((string as NSString).substringWithRange(match!.rangeAtIndex(2)))
}
语法匹配这三个字符中的任何一个。 [.!?]
表示&#34;或&#34;。 |
匹配字符串的开头。 ^
匹配字符串的结尾。 $
匹配空白字符。 \\s
匹配&#34;字&#34;字符。 \\w
匹配前面字符的零个或多个。 *
匹配前面一个或多个字符。 +
是一个前瞻性断言(例如,看看那里是否存在某些东西,但不要在该匹配中前进)。
我试图简化这一点,但它仍然非常复杂。正则表达式提供了丰富的文本模式匹配,但是,不可否认,当您第一次使用它时它会有点密集。但是这种表现与(a)重复的标点符号(例如(?=)
),(b)前导空格和(c)尾随空格相匹配。
答案 2 :(得分:1)
如果拆分基础比句子更深奥,则此扩展名可能有效。
extension String {
public func components(separatedBy separators: [String]) -> [String] {
var output: [String] = [self]
for separator in separators {
output = output.flatMap { $0.components(separatedBy: separator) }
}
return output.map { $0.trimmingCharacters(in: .whitespaces)}
}
}
let artists = "Rihanna, featuring Calvin Harris".components(separated by: [", with", ", featuring"])
答案 3 :(得分:0)
答案 4 :(得分:0)
我也从here
找到了正则表达式var pattern = "(?<=[.?!;…])\\s+(?=[\\p{Lu}\\p{N}])"
let s = "Hi! How are you? I'm fine. It is 6 p.m. Thank you! That's it."
let sReplaced = s.stringByReplacingOccurrencesOfString(pattern, withString:"[*-SENTENCE-*]" as String, options:NSStringCompareOptions.RegularExpressionSearch, range:nil)
let array = sReplaced.componentsSeparatedByString("[*-SENTENCE-*]")
也许这不是一个好方法,因为它必须首先替换并分离字符串。 :)
<强>更新强>
对于正则表达式部分,如果您还想匹配中文/日文标点符号(不需要每个标点符号后的空格),您可以使用以下标点:
((?<=[.?!;…])\\s+|(?<=[。!?;…])\\s*)(?=[\\p{L}\\p{N}])