Question

我有一个脚本，该脚本列出了前n个字（卡方值较高的字）。但是，我不想提取固定数量的n个单词，而是要提取p值小于0.05的所有单词，即拒绝原假设。

这是我的代码：

from sklearn.feature_selection import chi2

#vectorize top 100000 words
tfidf = TfidfVectorizer(max_features=100000,ngram_range=(1, 3))
X_tfidf = tfidf.fit_transform(df.review_text)
y = df.label
chi2score = chi2(X_tfidf, y)[0]
scores = list(zip(tfidf.get_feature_names(), chi2score))
chi2 = sorted(scores, key=lambda x:x[1])
allchi2 = list(zip(*chi2))

#lists top 20 words
allchi2 = allchi2[0][-20:]

因此，在这种情况下，我不想列出前20个单词，而是希望所有拒绝无效假设的单词，即评论中所有取决于情感类别（正数或负数）的词

Answer 1

public ProductEntry(int value){
       this.productdesc = new TextBox
       {
        Location = new Point(x,y), 
        Width = 30,
        Height = 30,
       };
       }

用卡方检验列出语料库中所有拒绝零假设的单词

1 个答案: