我通过平均tfidf wor2vec模型训练了用户评论,并获得了主要功能。希望将热门功能标记为正面和负面。
请您提出建议。
def top_tfidf_feats(row, features, top_n=1):
''' Get top n tfidf values in row and return them with their corresponding feature names.'''
topn_ids = np.argsort(row)[::-1][:top_n]
top_feats = [(features[i], row[i]) for i in topn_ids]
df = pd.DataFrame(top_feats)
df.columns = ['feature', 'tfidf']
return df
top_tfidf = top_tfidf_feats(final_tf_idf[1,:].toarray()[0],tfidf_feat,10)
Top 10 features...
feature tfidf
------- ------
0 urgent 0.513783
1 tells 0.501945
2 says 0.490708
3 clear 0.424756
4 care 0.206723
5 not 0.141886
6 flanum 0.000000
7 flap 0.000000
8 flare 0.000000
9 flared 0.000000
10 flares 0.000000