答案 0 :(得分:1)
您可以使用NSLinguisticsTagger来标识SentenceTerminator令牌,然后从那里将其拆分为字符串数组。
我使用了这段代码,效果很好。
https://stackoverflow.com/a/57985302/10736184
?name
现在结果将是一个句子字符串数组。请注意,该句子必须以'?','!','。'等结尾才能计数。如果您还想在换行符或其他词法类上进行拆分,则可以添加
let text = "My paragraph with weird punctuation like Nov. 17th."
var r = [Range<String.Index>]()
let t = text.linguisticTags(
in: text.startIndex..<text.endIndex,
scheme: NSLinguisticTagScheme.lexicalClass.rawValue,
tokenRanges: &r)
var result = [String]()
let ixs = t.enumerated().filter {
$0.1 == "SentenceTerminator"
}.map {r[$0.0].lowerBound}
var prev = text.startIndex
for ix in ixs {
let r = prev...ix
result.append(
text[r].trimmingCharacters(
in: NSCharacterSet.whitespaces))
prev = text.index(after: ix)
}
之后
|| $0.1 == "ParagraphBreak"
做到这一点。
答案 1 :(得分:0)
看一下这个链接: How to create String split extension with regex in Swift?
它显示了如何组合regex和componentsSeparatedByString。
答案 2 :(得分:0)
如果您能够使用Apple的Foundation
,则解决方案可能非常简单。
import Foundation
var text = """
Let's split some text into sentences.
The text might include dates like Jan.13, 2020, words like S.A.E and numbers like 2.2 or $9,999.99 as well as emojis like ????! How do I split this?
"""
var sentences: [String] = []
text.enumerateSubstrings(in: text.startIndex..., options: [.localized, .bySentences]) { (tag, _, _, _) in
sentences.append(tag ?? "")
}
当然有很多方法可以使用纯Swift。这是快速而肮脏的分裂:
let simpleText = """
This is a very simple text.
It doesn't include dates, abbreviations, and numbers, but it includes emojis like ????! How do I split this?
"""
let sentencesPureSwift = simpleText.split(omittingEmptySubsequences:true) { $0.isPunctuation && !Set("',").contains($0)}
可以用reduce()
进行完善。
答案 3 :(得分:-1)
试试这个: -
var myString : NSString = “This is a test”
var myWords: NSArray = myString.componentsSeparatedByString(“ “)
//myWords is now: ["This", "is", "a", "test"]