当前我正在研究一个项目,并使用Tfidf转换包含文本数据的X_train数据。当我使用 SELECT "id", "ts", login
FROM (
select "id", "ts", CASE WHEN "ts" = max("ts") OVER (PARTITION BY "id") THEN 1 ELSE 0 END as isMax, "login" from "author" group by "id"
) dt
WHERE isMax = 1
时出现此错误:
count_vectorizer.fit_transform(X_train)
我阅读了其他类似Link的stackoverflow问题,但我无法理解如何拆分X_train的数据
这是我的Train.py文件
Traceback (most recent call last):
File "train.py", line 100, in <module>
counts = count_vectorizer.fit_transform(X_train)
File "/home/vishalthadari/Documents/Seperation 1/API's/Confirmation API/python 3 /env/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 869, in fit_transform
self.fixed_vocabulary_)
File "/home/vishalthadari/Documents/Seperation 1/API's/Confirmation API/python 3 /env/lib/python3.6/site-packages/sklearn/feature_extraction/text.py", line 811, in _count_vocab
raise ValueError("empty vocabulary; perhaps the documents only"
ValueError: empty vocabulary; perhaps the documents only contain stop words
我遵循了所有解决方案,但仍未解决问题。如果我做对了,那我是在错误地转换数据吗?为什么我会遇到此错误。
预先感谢