sklearn.feature_selection中的chi2的“ ValueError:长度必须匹配才能进行比较”

时间:2019-04-02 23:53:04

标签: python scikit-learn jupyter-notebook data-science

我正在尝试运行以下命令,但遇到错误:class UserList(generics.ListCreateAPIView): queryset = Store.objects.all() permission_classes = (IsAuthenticated,) def get_serializer_class(self): # Here you will provide the implementation of the checking function if self.request.user.is_store_owner(): return PrivateStoreSerializer return PublicStoreSerializer

ValueError: Lengths must match to compare

代码来自https://towardsdatascience.com/multi-class-text-classification-with-scikit-learn-12f1e60e0a9f

输出为:

from sklearn.feature_selection import chi2
import numpy as np

N = 2
for Product, category_id in sorted(category_to_id.items()):
  features_chi2 = chi2(features, labels == category_id)
  indices = np.argsort(features_chi2[0])
  feature_names = np.array(tfidf.get_feature_names())[indices]
  unigrams = [v for v in feature_names if len(v.split(' ')) == 1]
  bigrams = [v for v in feature_names if len(v.split(' ')) == 2]
  print("# '{}':".format(Product))
  print("  . Most correlated unigrams:\n       . {}".format('\n       . '.join(unigrams[-N:])))
  print("  . Most correlated bigrams:\n       . {}".format('\n       . '.join(bigrams[-N:])))

--------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-45-bbfd1a1f6a1a> in <module>() 3 N = 2 4 for Product, category_id in sorted(category_to_id.items()): ----> 5 features_chi2 = chi2(features, labels == category) 6 indices = np.argsort(features_chi2[0]) 7 feature_names = np.array(tfidf.get_feature_names())[indices] C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\ops.py in wrapper(self, other, axis) 1221 # as it will broadcast 1222 if other.ndim != 0 and len(self) != len(other): -> 1223 raise ValueError('Lengths must match to compare') 1224 1225 res_values = na_op(self.values, np.asarray(other)) ValueError: Lengths must match to compare len(features)打印相同的计数。

1 个答案:

答案 0 :(得分:1)

您的回溯在第5行中有labels == category,但是在代码中您有labels == category_id。因此,这可能是您错误的根源。