我正在研究python中的分类问题。事实是,我在python中还不是很好。所以我很久以来就遇到了同样的问题,我不知道如何解决它。我希望你能帮助我:)。
这是我的代码:
tableau = pandas.DataFrame({'Exigence':exigence,'Résumé':resume})
df2, targets = encode_target(tableau,"Exigence")
features = list(df2.columns[:4])
for line in resume:
terms = prep.ngram_tokenizer(text=line)
mx.add_doc(doc_id='some-unique-identifier',
doc_class=df2["Target"],
doc_terms=terms,
frequency=True,
do_padding=True)
我有这个错误:
objects are mutable, thus they cannot be hashed
Traceback (most recent call last):
File "<ipython-input-9-072e9c71917a>", line 7, in <module>
do_padding=True)
File "C:\Users\nouguierc\AppData\Local\Continuum\Anaconda3\lib\site- packages\irlib\matrix.py", line 222, in add_doc
if doc_class in self.classes:
TypeError: __hash__ method should return an integer
当我走到matrix.py的第222行时,我看到了这一点:
if doc_class in self.classes:
self.classes[doc_class].add(my_doc_terms)
包含这些行的函数是:
def add_doc(self, doc_id = '', doc_class='', doc_terms=[],
frequency=False, do_padding=False):
''' Add new document to our matrix:
doc_id: Identifier for the document, eg. file name, url, etc.
doc_class: You might need this in classification.
doc_terms: List of terms you got after tokenizing the document.
frequency: If true, term occurences is incremented by one.
Else, occurences is only 0 or 1 (a la Bernoulli)
do_padding: Boolean. Check do_padding() for more info.
'''
# Update list of terms if new term seen.
# And document (row) with its associated data.
my_doc_terms = SuperList()
for term in doc_terms:
term_idx = self.terms.unique_append(term)
#my_doc_terms.insert_after_padding(self.terms.index(term))
if frequency:
my_doc_terms.increment_after_padding(term_idx,1)
else:
my_doc_terms.insert_after_padding(term_idx,1)
self.docs.append({ 'id': doc_id,
'class': doc_class,
'terms': my_doc_terms})
# Update list of document classes if new class seen.
# self.classes.unique_append(doc_class)
if doc_class in self.classes:
self.classes[doc_class].add(my_doc_terms)
else:
self.classes[doc_class] = my_doc_terms
if do_padding:
self.do_padding()
您如何看待我的问题?
答案 0 :(得分:0)
您正在将{strong>对象作为doc_class
传递,检查df2['Target']
返回的内容,可能是一个pandas系列,将其转换为单个字符串,然后传递它。