NER python - 从文件中的原始数据中提取名词?

时间:2016-05-10 09:42:19

标签: python

我有以下代码,我尝试从文件中提取名词单词:

import nltk
from nltk import word_tokenize
import re
from nltk.corpus import stopwords


fr = open('input.txt','r+')
fw = open('output.txt','a+')

for line in fr:
 line = line.lower() #converting from upper to lower case
 lines = ''.join([i for i in line if not i.isdigit()]) #removing numericals
 for word in lines.split():
  word=re.sub(r'[^a-zA-Z0-9]', ' ',word)
  if word not in stopwords.words('english'):
   tokenized_word=word_tokenize(word)
   tokenized_word=nltk.pos_tag(tokenized_word)
   fw.write(str(tokenized_word))
   fw.write('\n')

fr.close()
fw.close()

fw = open('noun_output.txt','w+')

with open('output.txt','r+') as fr:
 for line in fr:
   word = [word for word,pos in line if pos =='NN']
   print word

当我运行此代码时,我收到以下错误:

Traceback (most recent call last):
  File "1.py", line 29, in <module>
    word = [word for word,pos in line if pos =='NN']
ValueError: need more than 1 value to unpack

我对这个词做了一些试验(比如单词[0] [1]并且也分开了这个词)但是无法解决这个问题..请帮助!

0 个答案:

没有答案