gensim错误:在Fasttext中训练期间'NoneType'对象不可下标

时间:2018-10-12 18:48:57

标签: python python-3.x nltk gensim fasttext

在Python 3.7中实现Fasttext时,我遇到了与Exception in thread相关的意外情况,导致

  

NoneType'对象不可下标

完整堆栈跟踪的错误(屏幕截图)如下: enter image description here

gensim python中的这个问题到底是什么?

我尝试过的代码:

import nltk, re
import string
from collections import Counter 
from string import punctuation
from nltk.tokenize import word_tokenize
from nltk.corpus import gutenberg, stopwords
from nltk.stem import WordNetLemmatizer
from gensim.models import FastText

def preprocessing():
    raw_data = (gutenberg.raw('shakespeare-hamlet.txt'))
    tokens = word_tokenize(raw_data)
    tokens = [w.lower() for w in tokens]
    #remove punctuation from each word
    table = str.maketrans('', '', string.punctuation)
    stripped = [w.translate(table) for w in tokens]
    global words
    words = [word for word in stripped if word.isalpha()]
    sw = (stopwords.words('english'))
    sw1= (['.', ',', '"', '?', '!', ':', ';', '(', ')', '[', ']', '{', '}'])

    stop=sw+sw1
    words = [w for w in words if not w in stop]
preprocessing()

def freq_count():
    fd = nltk.FreqDist(words)
freq_count()

def intialize_word_embedding():
    model = FastText([words], size = 100, sg = 1, window = 5, min_count = 5, workers = 4)
    model.train([words], total_examples=len(words), epochs=10)
    model.init_sims(replace=True)
    model_name = "mcft"
    model.save(model_name)
    print(len(model.wv.vocab))
intialize_word_embedding()
def load_model():
    model = FastText.load('mcft')
    similarities = model.wv.most_similar('hamlet')
    for word, score in similarities:
        print(word , score)
    print(model.wv.similarity('hamlet', 'king'))
load_model()

注意:当我评论

时,该模型运行良好
model.train([words], total_examples=len(words), epochs=10)`
上面显示的代码中的

行。

0 个答案:

没有答案