我创建了一个具有9个类别的数据集:
classDict = {"text/dokujo-tsushin": "000000000000000000000000000000000000000000000000000000000000000000000000000000000000000001",
"text/it-life-hack": "000000000000000000000000000000000000000000000000000000000000000000000000000000000000000010",
"text/kaden-channel": "000000000000000000000000000000000000000000000000000000000000000000000000000000000000000100",
"text/livedoor-homme": "000000000000000000000000000000000000000000000000000000000000000000000000000000000000001000",
"text/movie-enter": "000000000000000000000000000000000000000000000000000000000000000000000000000000000000010000",
"text/peachy": "000000000000000000000000000000000000000000000000000000000000000000000000000000000000100000",
"text/smax": "000000000000000000000000000000000000000000000000000000000000000000000000000000000001000000",
"text/sports-watch": "000000000000000000000000000000000000000000000000000000000000000000000000000000000010000000",
"text/topic-news": "000000000000000000000000000000000000000000000000000000000000000000000000000000000100000000"}
尽管我通过以下行将标签数设置为9:
model = BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=args.num_labels)
我得到第十个标签,全为零。而且我不知道哪里出了问题。有人遇到过同样的问题吗?或者有人知道原因吗?
我用胶水检查了训练数据,发现没有一个标有全零的示例。