我目前正在为文本数据执行多标签分类任务。 我有一个带有ID列,文本列和几列的数据框,这些列是仅包含1或0的文本标签。
我使用了该网站Kaggle Toxic Comment Classification using Bert上提出的现有解决方案,该解决方案允许以百分比表示其对每个标签的归属程度。
现在,我已经训练了我的模型,我希望在不带标签的少量文本提取中对其进行测试,以获得属于每个标签的百分比:
我已经尝试过此解决方案:
def getPrediction(in_sentences):
label = ['S1, S2, S3']
input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, label=label) for x in in_sentences]
input_features = run_classifier.convert_examples_to_features(input_examples, LABEL_COLUMNS, MAX_SEQ_LENGTH, tokenizer)
predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
predictions = estimator.predict(predict_input_fn)
return [(sentence, prediction['probabilities'], labels[prediction['labels']]) for sentence, prediction in zip(in_sentences, predictions)]
pred_sentences = [
"here is an exemple of sentence"]
pred_sentences = ''.join(pred_sentences)
predictions = getPrediction(pred_sentences)
我得到了:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-490-770bf0871d3e> in <module>
----> 1 predictions = getPrediction(pred_sentences)
<ipython-input-486-3de7328d60db> in getPrediction(in_sentences)
2 label = ['S1','S2',
3 'S3']
----> 4 input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, labels=label) for x in in_sentences]
5 input_features = run_classifier.convert_examples_to_features(input_examples, LABEL_COLUMNS, MAX_SEQ_LENGTH, tokenizer)
6 predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
<ipython-input-486-3de7328d60db> in <listcomp>(.0)
2 label = ['S1,
3 S2,S3']
----> 4 input_examples = [run_classifier.InputExample(guid="", text_a = x, text_b = None, labels=label) for x in in_sentences]
5 input_features = run_classifier.convert_examples_to_features(input_examples, LABEL_COLUMNS, MAX_SEQ_LENGTH, tokenizer)
6 predict_input_fn = run_classifier.input_fn_builder(features=input_features, seq_length=MAX_SEQ_LENGTH, is_training=False, drop_remainder=False)
TypeError: __init__() got an unexpected keyword argument 'labels'
您知道需要进行哪些更改才能使算法的最后一部分正常工作吗?
答案 0 :(得分:0)
您输入了错字,InputExample
需要一个名为label
而不是labels
的关键字参数:
[run_classifier.InputExample(guid="", text_a = x, text_b = None, labels=label) for x in in_sentences]
^