加快多标签分类器的评估

时间:2019-08-30 07:34:46

标签: python-3.x scikit-learn spacy

我在spacy中有一个针对多标签分类问题的文本分类程序。评估确实需要很长时间。

并没有获得花费很长时间的概率,而是计算对数丢失精度,召回率和fscore。

from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import log_loss
import time


def evaluate(
    nlp, texts, cats, labels, threshold=0.3, beta=0.5, batch_size=8
):
    t0 = time.time()
    docs = nlp.pipe(texts, batch_size=batch_size)
    t1 = time.time()
    pred_probs = np.array([list(doc.cats.values()) for doc in docs])

    avg_log_loss = log_loss(cats, pred_probs)
    results = {'log_loss': avg_log_loss}
    y_pred = pred_probs > thresh
    prc, rec, fscore, _ = precision_recall_fscore_support(
      y_true=cats, y_pred=y_pred, beta=beta, average='micro', warn_for=set()
    )
    results[f'f{beta}_{thresh}'] = fscore
    results[f'prc_{thresh}'] = prc
    results[f'rec_{thresh}'] = rec
    t2 = time.time()
    print(f"Used {t1-t0} on predicting; {t2-t1} on scoring")
    return results

输出Used 4.76837158203125e-06 on predicting; 377.1225287914276 on scoring 有什么办法可以加快速度吗?

0 个答案:

没有答案