如何在Huggingface NER模型上进行推理

时间:2020-05-29 16:08:17

标签: pytorch huggingface-transformers

我已经使用Huggingface库训练了自定义NER模型。 培训代码:https://github.com/huggingface/transformers/tree/master/examples/token-classification

推断是使用指向我的模型的管道完成的:

class BERTExtractor:
"""class BERTExtractor encapsulates logic to pipe Records with text body
through a BERT model and return entities separated by Entity Type
"""

def __init__(self, model_path):
    """Initialize the BERTExtractor pipeline.
    model_path: Path to the Bert language model
    RETURNS (EntityRecognizer): The newly constructed object.
    """
    # load the NER tagger
    self.model_prediction_pipeline = pipeline(
        "ner", model=model_path,
        # tokenizer=model_path,
        tokenizer=AutoTokenizer.from_pretrained('distilbert-base-cased'),
        grouped_entities=True
    )

推断代码:

    async def extract_entities(
    self, input_text_list: List[str]
) -> List[RecordDataResponse]:
    """Apply the pre-trained model to a batch of records
    input_text (str): The input text which needs to be predicted
    RETURNS (list): List of responses containing the
    the correlating document and a list of entities.
    """

    # Normalize the sequences
    prediction_input_list = self._normalize_input_sequence_length(input_text_list)

    # Perform prediction
    prediction_results_list = [
        prediction
        for prediction_input in prediction_input_list
        for prediction in self.model_prediction_pipeline(prediction_input)
        if prediction
        and prediction["word"] not in self.stop_list
    ]

请澄清以下查询:

  1. 我应该训练一个自定义标记器在推理之前创建一个vocab文件吗?
  2. 我们可以使用 HuggingFace管道进行推理,还是应该在对输入文本进行编码后编写自定义脚本进行推理?

谢谢。

0 个答案:

没有答案
相关问题