我正在使用Python 2.7和由Jacob Perkins创建的名为NLTK Trainer的强大工具。我已经成功使用了NaiveBayes分类器,但是当我尝试使用各种scikit-learn分类器时,它会抛出错误消息。请帮忙。这是我的代码和相关的错误消息。
C:\WINDOWS\system32>C:\Python27\python C:\Users\ned\Desktop\nltk-trainer-master
\train_classifier.py --instances files --fraction 0.75 --no-pickle --min_score 2
--ngrams 1 2 3 --show-most-informative 10 movie_reviews --classifier sklearn.Mu
ltinomialNB
training sklearn.MultinomialNB classifier
C:\Python27\lib\site-packages\numpy\core\fromnumeric.py:2499: VisibleDeprecation
Warning: `rank` is deprecated; use the `ndim` attribute or function instead. To
find the rank of a matrix see `numpy.linalg.matrix_rank`.
VisibleDeprecationWarning)
Traceback (most recent call last):
File "C:\Users\ned\Desktop\nltk-trainer-master\train_classifier.py", line 385,
in <module>
print('accuracy: %f' % accuracy(classifier, test_feats))
File "C:\Python27\lib\site-packages\nltk\classify\util.py", line 87, in accura
cy
results = classifier.classify_many([fs for (fs, l) in gold])
File "C:\Python27\lib\site-packages\nltk\classify\scikitlearn.py", line 83, in
classify_many
X = self._vectorizer.transform(featuresets)
File "C:\Users\ned\Desktop\nltk-trainer-master\sklearn\feature_extraction\dict
_vectorizer.py", line 286, in transform
return self._transform(X, fitting=False)
File "C:\Users\ned\Desktop\nltk-trainer-master\sklearn\feature_extraction\dict
_vectorizer.py", line 196, in _transform
result_matrix.sort_indices()
File "C:\Python27\lib\site-packages\scipy\sparse\compressed.py", line 619, in
sort_indices
fn( len(self.indptr) - 1, self.indptr, self.indices, self.data)
File "C:\Python27\lib\site-packages\scipy\sparse\sparsetools\csr.py", line 546
, in csr_sort_indices
return _csr.csr_sort_indices(*args)
TypeError: Array of type 'byte' required. Array of type 'bool' given
然后我使用以下版本: Python 2.7.10
Python 2.7 numpy 1.9.1
Python 2.7 scikit-learn 0.16.1
Python 2.7 scipy 0.10.1
Python 2.7 NLTK 3.0.4
Argparse 1.3.0
***感谢大家的帮助。问题确实是一个过时的库。我从这里安装了最新版本:http://www.lfd.uci.edu/~gohlke/pythonlibs/ 并从这里使用简单的安装指南: https://www.youtube.com/watch?v=jnpC_Ib_lbc
答案 0 :(得分:3)
你正在使用scipy 0.10.1,这是几个版本 - 尝试升级到scipy 0.14。
以下是它的工作示例和所用软件包的版本......
ActiveCell.Formula = "=VLOOKUP(MIN(IF(ABS('[" & initPath & (n / 3 - 2) & " " & freqcntr & ".xlsm]" & (n / 3 - 2) & " " & freqcntr & "'!$C$17:$C$100-B5*1000)=MIN(ABS('[" & initPath & (n / 3 - 2) & " " & freqcntr & ".xlsm]" & (n / 3 - 2) & " " & freqcntr & "'!$C$17:$C$100-B5*1000)),IF(ABS('[" & initPath & (n / 3 - 2) & " " & freqcntr & ".xlsm]" & (n / 3 - 2) & " " & freqcntr & "'!$C$17:$C$100-B5*1000)< 150,'[" & initPath & (n / 3 - 2) & " " & freqcntr & ".xlsm]" & (n / 3 - 2) & " " & freqcntr & "'!$C$17:$C$100-B5*1000,))),'[" & initPath & (n / 3 - 2) & " " & freqcntr & ".xlsm]" & (n / 3 - 2) & " " & freqcntr & "'!$C$17:$E$100,3,FALSE)"
答案 1 :(得分:0)
我注意到此项目的github存储库中存在一个问题,其中包含确切的错误消息:
https://github.com/japerk/nltk-trainer/issues/12
用户声明:
知道了,我在不同的机器上用diff训练了分类器 scipy和/或sklearn的版本。
在你上面的例子中,你似乎在训练的是你正在运行的同一台机器上,是这样吗?
答案 2 :(得分:0)
可能相关吗? https://github.com/scipy/scipy/issues/2058 如果没有,它可能会让您对问题有更多的澄清。
在另一张票中,如果是版本问题,我会对所有内容进行版本检查。我认为Python 3现在比2.7更加积极地开发/支持。