如何使用TFIDF创建嵌入矩阵?

时间:2020-01-23 12:09:22

标签: python-3.x nlp recurrent-neural-network word2vec tf-idf

我采用了Text数据集来预测评论的情绪是正面还是负面。通过使用TFIDF,我已将单词转换为向量。接下来,我已加载“手套嵌入”预训练文件。现在如何使用TFIDF和手套词嵌入来创建嵌入矩阵?我想在循环神经网络中使用嵌入矩阵。

创建嵌入矩阵时遇到索引错误,如果我在编码部分做错了什么,请纠正我。

**TFIDF Vectorizer**
''' from sklearn.feature_extraction.text import TfidfVectorizer
    vectorizer_1 = TfidfVectorizer( max_features=10000,sublinear_tf=True, 
    use_idf=True,stop_words='english')
    X_vt = vectorizer_1.fit_transform(X_train)
    X_vt.shape
    (426340, 10000)'''
**Glove Embedding**
''' embedding_index = {}
    f = open(os.path.join(' ', 
    'C:/Users/User/glove.6B/glove.6B.100d.txt'),encoding="utf-8")
    for line in f:
      values=line.split()
      word = values[0]
      coefs = np.asarray(values[1:])
      embedding_index[word] = coefs
    f.close() '''

enter code here
''' emdedding_matrix = zeros((vocab_size,100))
 for feature, names in vectorizer_1.get_feature_items():
     embedding_vector = embedding_index.get(feature)
     if embedding_vector is not None:
        emdedding_matrix[names] = embedding_vector '''

Error

0 个答案:

没有答案