我正在使用sklearn Pipeline和FeatureUnion从文本文件创建功能,我想打印出功能名称。
首先,我将所有转换收集到一个列表中。
In [225]:components
Out[225]:
[TfidfVectorizer(analyzer=u'word', binary=False, decode_error=u'strict',
dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',
lowercase=True, max_df=0.85, max_features=None, min_df=6,
ngram_range=(1, 1), norm='l1', preprocessor=None, smooth_idf=True,
stop_words='english', strip_accents=None, sublinear_tf=True,
token_pattern=u'(?u)[#a-zA-Z0-9/\\-]{2,}',
tokenizer=StemmingTokenizer(proc_type=stem, token_pattern=(?u)[a-zA-Z0-9/\-]{2,}),
use_idf=True, vocabulary=None),
TruncatedSVD(algorithm='randomized', n_components=150, n_iter=5,
random_state=None, tol=0.0),
TextStatsFeatures(),
DictVectorizer(dtype=<type 'numpy.float64'>, separator='=', sort=True,
sparse=True),
DictVectorizer(dtype=<type 'numpy.float64'>, separator='=', sort=True,
sparse=True),
TfidfVectorizer(analyzer=u'word', binary=False, decode_error=u'strict',
dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',
lowercase=True, max_df=0.85, max_features=None, min_df=6,
ngram_range=(1, 2), norm='l1', preprocessor=None, smooth_idf=True,
stop_words='english', strip_accents=None, sublinear_tf=True,
token_pattern=u'(?u)[a-zA-Z0-9/\\-]{2,}',
tokenizer=StemmingTokenizer(proc_type=stem, token_pattern=(?u)[a-zA-Z0-9/\-]{2,}),
use_idf=True, vocabulary=None)]
例如,第一个组件是TfidfVectorizer()对象。
components[0]
Out[226]:
TfidfVectorizer(analyzer=u'word', binary=False, decode_error=u'strict',
dtype=<type 'numpy.int64'>, encoding=u'utf-8', input=u'content',
lowercase=True, max_df=0.85, max_features=None, min_df=6,
ngram_range=(1, 1), norm='l1', preprocessor=None, smooth_idf=True,
stop_words='english', strip_accents=None, sublinear_tf=True,
token_pattern=u'(?u)[#a-zA-Z0-9/\\-]{2,}',
tokenizer=StemmingTokenizer(proc_type=stem, token_pattern=(?u)[a-zA-Z0-9/\-]{2,}),
use_idf=True, vocabulary=None)
type(components[0])
Out[227]: sklearn.feature_extraction.text.TfidfVectorizer
但是当我尝试使用TfidfVectorizer方法get_feature_names时,它会抛出NotFittedError
components[0].get_feature_names()
Traceback (most recent call last):
File "<ipython-input-228-0160deb904f5>", line 1, in <module>
components[0].get_feature_names()
File "C:\Users\fheng\AppData\Local\Continuum\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 903, in get_feature_names
self._check_vocabulary()
File "C:\Users\fheng\AppData\Local\Continuum\Anaconda\lib\site-packages\sklearn\feature_extraction\text.py", line 275, in _check_vocabulary
check_is_fitted(self, 'vocabulary_', msg=msg),
File "C:\Users\fheng\AppData\Local\Continuum\Anaconda\lib\site-packages\sklearn\utils\validation.py", line 678, in check_is_fitted
raise NotFittedError(msg % {'name': type(estimator).__name__})
**NotFittedError: TfidfVectorizer - Vocabulary wasn't fitted.**
答案 0 :(得分:5)
您是否在pipeline
或featureUnion
中使用了此列表?你有没有调用fit()
方法?
此错误是您没有调用fit()
(即训练模型)和直接尝试访问值。