我在spacy
中有一个针对多标签分类问题的文本分类程序。评估确实需要很长时间。
并没有获得花费很长时间的概率,而是计算对数丢失精度,召回率和fscore。
from sklearn.metrics import precision_recall_fscore_support
from sklearn.metrics import log_loss
import time
def evaluate(
nlp, texts, cats, labels, threshold=0.3, beta=0.5, batch_size=8
):
t0 = time.time()
docs = nlp.pipe(texts, batch_size=batch_size)
t1 = time.time()
pred_probs = np.array([list(doc.cats.values()) for doc in docs])
avg_log_loss = log_loss(cats, pred_probs)
results = {'log_loss': avg_log_loss}
y_pred = pred_probs > thresh
prc, rec, fscore, _ = precision_recall_fscore_support(
y_true=cats, y_pred=y_pred, beta=beta, average='micro', warn_for=set()
)
results[f'f{beta}_{thresh}'] = fscore
results[f'prc_{thresh}'] = prc
results[f'rec_{thresh}'] = rec
t2 = time.time()
print(f"Used {t1-t0} on predicting; {t2-t1} on scoring")
return results
输出Used 4.76837158203125e-06 on predicting; 377.1225287914276 on scoring
有什么办法可以加快速度吗?