Question

我正在执行关键短语分类任务，为此，我正在从python中的关键短语中提取head名词。互联网上提供的帮助很少，没有很好的用处。我为此感到挣扎。

Answer 1

此任务被称为词性标记，属于自然语言处理（NLP）领域。为了从文本中提取名词，您可以使用matrix(rnorm(100*1000),nrow=100,ncol=1000)

nltk

或import nltk text= 'Your text goes here' # Check if noun (=NN) isNoun = lambda pos: pos[:2] == 'NN' # tokenise text and keep only nouns tokenized = nltk.word_tokenize(lines) nouns = [word for (word, pos) in nltk.pos_tag(tokenized) if isNoun (pos)] print(nouns)

TextBlow

如果您想了解有关PoS标记的更多信息，您可能会发现this post from official's nltk page非常有用。

Answer 2

您可以通过使用NLTK工具包来使用词性标记来句子，并提取与“名词”，“动词”相关的标记

text = '''I am doing a keyphrase classification task and for this i am working with the head noun extraction from keyphrases in python. The little help available on internet is not of good use. i am struggling with this.'''
pos_tagged_sent = nltk.pos_tag(nltk.tokenize.word_tokenize(text))

nouns = [tag[0] for tag in pos_tagged_sent if tag[1]=='NN']

出局：

[('I', 'PRP'),
 ('am', 'VBP'),
 ('doing', 'VBG'),
 ('a', 'DT'),
 ('keyphrase', 'NN'),
 ('classification', 'NN'),

Answer 3

您可以使用Stanford Parser package in NLTK并获得依赖关系；然后使用适合您的关系，例如 nn 或 compound （名词复合修饰符）。您可以查看De Marneffe的类型化依赖项手册here。

在手册中，“石油价格期货”的名词短语包含具有两个修饰符和一个头部的化合物。

您可以从Stanford Parser演示界面here中检查任何句子的分析树和相关性。

希望这会有所帮助，

欢呼

如何从python中的短语中提取头部名词？

3 个答案: