应用错误收集

时间：2019-12-17 15:52:23

标签： embedding corpus pre-trained-model

就像word2vec / GloVe一样，我希望从头开始使用 BERT 嵌入我的特定领域语料库。有了这些嵌入，我可以将它们用于句子相似度（已经使用过的SBERT）。但是我不想使用任何预训练的模型/数据（用于分类/下一句预测的微调模型）。

到目前为止，除此处使用的解决方案/方法外，我无法找到使用BERT嵌入（自己的）语料库的任何解决方案/方法：https://github.com/google-research/bert/blob/master/run_classifier.py

有没有办法做到这一点？谢谢。

答案 0 :(得分：0)

我认为您的问题的解决方案已在以下问题中得到解决： https://github.com/google-research/bert/issues/615

还要生成特定领域的词汇表，请参考此仓库https://github.com/kwonmha/bert-vocab-builder

希望有帮助！