GPT2拥抱面的变压器预处理

时间:2020-03-23 04:03:57

标签: huggingface-transformers

在GPT2的预处理步骤中,我们到底应该怎么做?有准则吗?

这对预处理步骤好吗?

1. Remove any \n from sentence
2. Remove extra spaces from sentence
3. Leave everything else that is part of the sentence but not exactly words (e.g. urls, non-english words that may be added in an english sentence, emojis, etc...)

删除多余的标点符号或任何非英语字符会更好吗?

0 个答案:

没有答案