Tfidf空词汇;也许文件只包含停用词

时间:2018-11-06 16:25:51

标签: python pandas machine-learning scikit-learn tf-idf

当前我正在研究一个项目,并使用Tfidf转换包含文本数据的X_train数据。当我使用 SELECT "id", "ts", login FROM ( select "id", "ts", CASE WHEN "ts" = max("ts") OVER (PARTITION BY "id") THEN 1 ELSE 0 END as isMax, "login" from "author" group by "id" ) dt WHERE isMax = 1 时出现此错误:

count_vectorizer.fit_transform(X_train)

我阅读了其他类似Link的stackoverflow问题,但我无法理解如何拆分X_train的数据

这是我的Train.py文件

Traceback (most recent call last):
  File "train.py", line 100, in <module>
    counts = count_vectorizer.fit_transform(X_train)
  File "/home/vishalthadari/Documents/Seperation 1/API's/Confirmation API/python 3 /env/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 869, in fit_transform
    self.fixed_vocabulary_)
  File "/home/vishalthadari/Documents/Seperation 1/API's/Confirmation API/python 3 /env/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 811, in _count_vocab
    raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words

我遵循了所有解决方案,但仍未解决问题。如果我做对了,那我是在错误地转换数据吗?为什么我会遇到此错误。

预先感谢

0 个答案:

没有答案