NLTK Trainer:无法让Scikit-Learn分类器工作

时间:2015-07-25 01:35:58

标签: python scikit-learn nltk

我正在使用Python 2.7和由Jacob Perkins创建的名为NLTK Trainer的强大工具。我已经成功使用了NaiveBayes分类器,但是当我尝试使用各种scikit-learn分类器时,它会抛出错误消息。请帮忙。这是我的代码和相关的错误消息。

C:\WINDOWS\system32>C:\Python27\python  C:\Users\ned\Desktop\nltk-trainer-master
\train_classifier.py --instances files --fraction 0.75 --no-pickle --min_score 2
 --ngrams 1 2 3 --show-most-informative 10 movie_reviews --classifier sklearn.Mu
ltinomialNB



training sklearn.MultinomialNB classifier
C:\Python27\lib\site-packages\numpy\core\fromnumeric.py:2499: VisibleDeprecation
Warning: `rank` is deprecated; use the `ndim` attribute or function instead. To
find the rank of a matrix see `numpy.linalg.matrix_rank`.
  VisibleDeprecationWarning)
Traceback (most recent call last):
  File "C:\Users\ned\Desktop\nltk-trainer-master\train_classifier.py", line 385,
 in <module>
    print('accuracy: %f' % accuracy(classifier, test_feats))
  File "C:\Python27\lib\site-packages\nltk\classify\util.py", line 87, in accura
cy
    results = classifier.classify_many([fs for (fs, l) in gold])
  File "C:\Python27\lib\site-packages\nltk\classify\scikitlearn.py", line 83, in
 classify_many
    X = self._vectorizer.transform(featuresets)
  File "C:\Users\ned\Desktop\nltk-trainer-master\sklearn\feature_extraction\dict
_vectorizer.py", line 286, in transform
    return self._transform(X, fitting=False)
  File "C:\Users\ned\Desktop\nltk-trainer-master\sklearn\feature_extraction\dict
_vectorizer.py", line 196, in _transform
    result_matrix.sort_indices()
  File "C:\Python27\lib\site-packages\scipy\sparse\compressed.py", line 619, in
sort_indices
    fn( len(self.indptr) - 1, self.indptr, self.indices, self.data)
  File "C:\Python27\lib\site-packages\scipy\sparse\sparsetools\csr.py", line 546
, in csr_sort_indices
    return _csr.csr_sort_indices(*args)
TypeError: Array of type 'byte' required.  Array of type 'bool' given

然后我使用以下版本: Python 2.7.10

Python 2.7 numpy 1.9.1

Python 2.7 scikit-learn 0.16.1

Python 2.7 scipy 0.10.1

Python 2.7 NLTK 3.0.4

Argparse 1.3.0

***感谢大家的帮助。问题确实是一个过时的库。我从这里安装了最新版本:http://www.lfd.uci.edu/~gohlke/pythonlibs/ 并从这里使用简单的安装指南: https://www.youtube.com/watch?v=jnpC_Ib_lbc

3 个答案:

答案 0 :(得分:3)

你正在使用scipy 0.10.1,这是几个版本 - 尝试升级到scipy 0.14。

以下是它的工作示例和所用软件包的版本......

ActiveCell.Formula = "=VLOOKUP(MIN(IF(ABS('[" & initPath & (n / 3 - 2) & " " & freqcntr & ".xlsm]" & (n / 3 - 2) & " " & freqcntr & "'!$C$17:$C$100-B5*1000)=MIN(ABS('[" & initPath & (n / 3 - 2) & " " & freqcntr & ".xlsm]" & (n / 3 - 2) & " " & freqcntr & "'!$C$17:$C$100-B5*1000)),IF(ABS('[" & initPath & (n / 3 - 2) & " " & freqcntr & ".xlsm]" & (n / 3 - 2) & " " & freqcntr & "'!$C$17:$C$100-B5*1000)< 150,'[" & initPath & (n / 3 - 2) & " " & freqcntr & ".xlsm]" & (n / 3 - 2) & " " & freqcntr & "'!$C$17:$C$100-B5*1000,))),'[" & initPath & (n / 3 - 2) & " " & freqcntr & ".xlsm]" & (n / 3 - 2) & " " & freqcntr & "'!$C$17:$E$100,3,FALSE)"

答案 1 :(得分:0)

我注意到此项目的github存储库中存在一个问题,其中包含确切的错误消息:

https://github.com/japerk/nltk-trainer/issues/12

用户声明:

  

知道了,我在不同的机器上用diff训练了分类器   scipy和/或sklearn的版本。

在你上面的例子中,你似乎在训练的是你正在运行的同一台机器上,是这样吗?

答案 2 :(得分:0)

可能相关吗? https://github.com/scipy/scipy/issues/2058 如果没有,它可能会让您对问题有更多的澄清。

在另一张票中,如果是版本问题,我会对所有内容进行版本检查。我认为Python 3现在比2.7更加积极地开发/支持。