Question

当前我正在研究一个项目，并使用Tfidf转换包含文本数据的X_train数据。当我使用SELECT "id", "ts", login FROM ( select "id", "ts", CASE WHEN "ts" = max("ts") OVER (PARTITION BY "id") THEN 1 ELSE 0 END as isMax, "login" from "author" group by "id" ) dt WHERE isMax = 1时出现此错误：

count_vectorizer.fit_transform(X_train)

我阅读了其他类似Link的stackoverflow问题，但我无法理解如何拆分X_train的数据

这是我的Train.py文件

Traceback (most recent call last):
  File "train.py", line 100, in <module>
    counts = count_vectorizer.fit_transform(X_train)
  File "/home/vishalthadari/Documents/Seperation 1/API's/Confirmation API/python 3 /env/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 869, in fit_transform
    self.fixed_vocabulary_)
  File "/home/vishalthadari/Documents/Seperation 1/API's/Confirmation API/python 3 /env/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 811, in _count_vocab
    raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words

我遵循了所有解决方案，但仍未解决问题。如果我做对了，那我是在错误地转换数据吗？为什么我会遇到此错误。

预先感谢

Tfidf空词汇；也许文件只包含停用词

0 个答案: