如何使用Python对文本文件中的名词进行分类

时间:2018-03-29 19:37:40

标签: python machine-learning text nltk text-mining

从商业文章中我想提取定义其所讨论的商业性质的词语。例如,如果文章中包含“零售银行”或“快递服务”或“钢铁厂”等字样,我们就可以了解业务。

`

import nltk
from nltk.collocations import *
from nltk import *
import csv
from nltk.corpus import stopwords
Text=open('bbb_2.txt')
t=Text.read().lower().decode('utf8')

tokens = nltk.wordpunct_tokenize(t)


posTagged=pos_tag(tokens)

nnp=[(wrd,tags) for (wrd,tags) in posTagged if tags in ('NNP','NNPS') ]`

在这里,我可以提取名词实体。但是,我如何将它们标记为与业务相关的?为了更多说明,我举了一个例子。 例。假设这是文章的一部分

`Microsoft Corporation is an American multinational technology company with headquarters in Redmond, Washington. It develops, manufactures, licenses, supports and sells computer software, consumer electronics, personal computers, and services.Its best known software products are the Microsoft Windows line of operating systems, the Microsoft Office suite, and the Internet Explorer and Edge web browsers.` 

现在的任务是提取Microsoft开发的产品类型。答案就是 - computer software, consumer electronics, personal computers, and services。问题是如何使计算机理解这一点?

0 个答案:

没有答案