希望您可以帮助我。我正在尝试处理保存在.txt文件中的数据;因此,我尝试将其标记化(句子和单词),并使用从网上获得的管道方法添加POS标签。我认为我已经设法输入了文件,但是我努力生成一个outpufile以在被标记化之后``写''一个类似对象的文件等。我尝试了几件事,但是我认为我没有掌握任务的复杂性。 非常感谢,
with open('outnovelaTerror', 'w') as wtexts:
for line in texts:
wtexts.write(line)
import nltk, re, pprint
from nltk import word_tokenize
from nltk.probability import FreqDist
from nltk.tokenize import sent_tokenize, word_tokenize
def source(texts, targets): # I used this to import the file
with open('novelaTerror.txt', 'r') as texts:
for text in texts:
for t in targets:
t.send(text)
def sent_tokenize_pipeline(targets):
while True:
text = (yield)
sentences = nltk.sent_tokenize(text)
for sentence in sentences:
for target in targets:
target.send(sentence
def word_tokenize_pipeline(targets):
while True:
sentence = (yield)
words = nltk.word_tokenize(sentence)
for target in targets:
target.send(words)
with open('outnovelaTerror', 'w') as wtexts: ## I tried this to save the file in my working directory##
for line in texts:
wtexts.write(line)