Swift:将String拆分成句子

时间:2015-04-27 19:23:59

标签: ios iphone swift

我想知道如何将包含多个句子的字符串拆分成句子数组。

我知道分割功能,但按"."分割并不适用于所有情况。

是否有类似answer

中提到的内容

4 个答案:

答案 0 :(得分:1)

您可以使用NSLinguisticsTagger来标识SentenceTerminator令牌,然后从那里将其拆分为字符串数组。

我使用了这段代码,效果很好。

https://stackoverflow.com/a/57985302/10736184

?name

现在结果将是一个句子字符串数组。请注意,该句子必须以'?','!','。'等结尾才能计数。如果您还想在换行符或其他词法类上进行拆分,则可以添加

let text = "My paragraph with weird punctuation like Nov. 17th."
var r = [Range<String.Index>]()
let t = text.linguisticTags(
    in: text.startIndex..<text.endIndex,
    scheme: NSLinguisticTagScheme.lexicalClass.rawValue,
    tokenRanges: &r)
var result = [String]()
let ixs = t.enumerated().filter {
    $0.1 == "SentenceTerminator"
}.map {r[$0.0].lowerBound}
var prev = text.startIndex
for ix in ixs {
    let r = prev...ix
    result.append(
        text[r].trimmingCharacters(
             in: NSCharacterSet.whitespaces))
     prev = text.index(after: ix)
}

之后

|| $0.1 == "ParagraphBreak"

做到这一点。

答案 1 :(得分:0)

看一下这个链接: How to create String split extension with regex in Swift?

它显示了如何组合regex和componentsSeparatedByString。

答案 2 :(得分:0)

如果您能够使用Apple的Foundation,则解决方案可能非常简单。

import Foundation

var text = """
    Let's split some text into sentences.
    The text might include dates like Jan.13, 2020, words like S.A.E and numbers like 2.2 or $9,999.99 as well as emojis like ?‍?‍?‍?! How do I split this?
"""
var sentences: [String] = []
text.enumerateSubstrings(in: text.startIndex..., options: [.localized, .bySentences]) { (tag, _, _, _) in
    sentences.append(tag ?? "")
}

当然有很多方法可以使用纯Swift。这是快速而肮脏的分裂:

let simpleText = """
This is a very simple text.
It doesn't include dates, abbreviations, and numbers, but it includes emojis like ?‍?‍?‍?! How do I split this?
"""

let sentencesPureSwift =  simpleText.split(omittingEmptySubsequences:true) {  $0.isPunctuation && !Set("',").contains($0)}

可以用reduce()进行完善。

答案 3 :(得分:-1)

试试这个: -

    var myString : NSString = “This is a test”
    var myWords: NSArray = myString.componentsSeparatedByString(“ “)
    //myWords is now: ["This", "is", "a", "test"]