我正在查看Swift的NSLinguisticTagger。出于测试目的,我使用了appcoda Introduction to Natural Language Processing中的代码。
以下是Sai Kambampati在他的教程中使用的代码:
import Foundation
let quote = "Here's to the crazy ones. The misfits. The rebels. The troublemakers. The round pegs in the square holes. The ones who see things differently. They're not fond of rules. And they have no respect for the status quo. You can quote them, disagree with them, glorify or vilify them. About the only thing you can't do is ignore them. Because they change things. They push the human race forward. And while some may see them as the crazy ones, we see genius. Because the people who are crazy enough to think they can change the world, are the ones who do. - Steve Jobs (Founder of Apple Inc.)"
let tagger = NSLinguisticTagger(tagSchemes:[.tokenType, .language, .lexicalClass, .nameType, .lemma], options: 0)
let options: NSLinguisticTagger.Options = [.omitPunctuation, .omitWhitespace, .joinNames]
func determineLanguage(for text: String) {
tagger.string = text
let language = tagger.dominantLanguage
print("The language is \(language!)")
determineLanguage(for: quote)
func tokenizeText(for text: String) {
tagger.string = text
let range = NSRange(location: 0, length: text.utf16.count)
tagger.enumerateTags(in: range, unit: .word, scheme: .tokenType, options: options) { tag, tokenRange, stop in
let word = (text as NSString).substring(with: tokenRange)
tokenizeText(for: quote)
func partsOfSpeech(for text: String) {
tagger.string = text
let range = NSRange(location: 0, length: text.utf16.count)
tagger.enumerateTags(in: range, unit: .word, scheme: .lexicalClass, options: options) { tag, tokenRange, _ in
if let tag = tag {
let word = (text as NSString).substring(with: tokenRange)
print("\(word): \(tag.rawValue)")
partsOfSpeech(for: quote)
func namedEntityRecognition(for text: String) {
tagger.string = text
let range = NSRange(location: 0, length: text.utf16.count)
let tags: [NSLinguisticTag] = [.personalName, .placeName, .organizationName]
tagger.enumerateTags(in: range, unit: .word, scheme: .nameType, options: options) { tag, tokenRange, stop in
if let tag = tag, tags.contains(tag) {
let name = (text as NSString).substring(with: tokenRange)
print("\(name): \(tag.rawValue)")
namedEntityRecognition(for: quote)
回合:名词 钉住:名词...
Apple Inc。:Noun
Apple Inc。:OrganizationName
let quote = "Apple führt die Hitliste der Silicon-Valley-Unternehmen an, bei denen sich Ingenieure das Wohnen in der Nähe nicht mehr leisten können. Dahinter folgen das Portal Reddit (San Francisco), der Suchriese Google (Mountain View) und die sozialen Netzwerke Twitter (San Francisco) und Facebook (Menlo Park)"
答案 0 :(得分:0)
我尚未测试您的上述情况,但附上我用来开发语音标记器的以下内容。它包括setLanguage命令和setOthography命令。 (后者,我还没有尝试过。)
我的理解是,标记器是识别语言并在需要时切换语言或可以对其进行设置。似乎这里使用的逻辑没有完全揭示。我已确定,如果可以的话,我的最佳做法是设置语言。在此代码中,语言存储为字符串语言。 (顺便说一句,在我看来,这是通过阅读更大的文档来完成的。)
if let language = language {
// If language has a value, it is taken as a specification for the language of the text and set on the tagger.
let orthography = NSOrthography.defaultOrthography(forLanguage: language)
POStagger.setOrthography(orthography, range: range)
POStagger.setLanguage(NLLanguage(rawValue: language), range: range)