Python中的TFIDF

时间:2017-06-02 06:40:07

标签: python python-2.7 tf-idf

下面是我在python中创建tfidf矩阵的函数

def tf_idf(self,job_id,method='local'):
    jobtext = self.get_job_text ( job_id , method=method )
    tfidf_vectorizer = TfidfVectorizer( max_df=0.8 , max_features=200000 ,
                                        min_df=0.2 , stop_words='english' ,
                                        use_idf=True , tokenizer=self.tokenize_and_stem(jobtext), ngram_range=(1, 3) )
    #tfidf_vectorizer.fit(jobtext)
    tfidf_matrix = tfidf_vectorizer.fit_transform(jobtext) #fit the vectorizer to synopses
    print(tfidf_matrix.shape)

我收到了以下错误:

  

追踪(最近一次呼叫最后一次):

  File ".../employment_skills_extraction-master/api/process_request.py", line 206, in <module>
main()
  File ".../employment_skills_extraction-master/api/process_request.py", line 202, in main
print pr.process(json.dumps(test))
  File ".../employment_skills_extraction-master/api/process_request.py", line 188, in process
termVector=self.tf_idf(job_id)
  File ".../employment_skills_extraction-master/api/process_request.py", line 174, in tf_idf
tfidf_matrix = tfidf_vectorizer.fit_transform(jobtext) #fit the vectorizer to synopses
  File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py", line 1285, in fit_transform
X = super(TfidfVectorizer, self).fit_transform(raw_documents)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py", line 804, in fit_transform
self.fixed_vocabulary_)
  File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py", line 739, in _count_vocab
for feature in analyze(doc):
  File "/usr/local/lib/python2.7/dist-packages/sklearn/feature_extraction/text.py", line 236, in <lambda>
tokenize(preprocess(self.decode(doc))), stop_words)
TypeError: 'list' object is not callable

请帮助我收到此错误的原因?

1 个答案:

答案 0 :(得分:0)

TypeError: 'list' object is not callable看起来像错误的相关部分,它涉及您的变量job_id,这可能不是您认为的那样。无论它应该是什么,它可能是一个包含你想要的东西的列表(我不知道多长时间)。

如果在函数的第二行插入一行,并更改变量名称,使其保持优雅,如下所示:

job_id_element = job_id[0]
jobtext = self.get_job_text ( job_id_element , method=method )

它可能会奏效。

只需检查变量job_id的内容,并考虑是否需要它的第一个元素 - 我写的0 - 或者len(job_id)是你需要的最后一个而不是0,或者可能是另一个。