在阅读docs并进行tutorial之后,我觉得我会做一个小型演示。原来我的模型不想训练。这是代码
<iframe src="data:application/pdf;0x255044462D312E34205368617270205363616E6E656420496D6167655044460A255368617270204E6F6E2D456E6372797074696F6E0A33203020" height="100%" width="100%"></iframe>
运行此命令时,输出表明学习的知识很少。
import spacy
import random
import json
TRAINING_DATA = [
["My little kitty is so special", {"KAT": True}],
["Dude, Totally, Yeah, Video Games", {"KAT": False}],
["Should I pay $1,000 for the iPhone X?", {"KAT": False}],
["The iPhone 8 reviews are here", {"KAT": False}],
["Noa is a great cat name.", {"KAT": True}],
["We got a new kitten!", {"KAT": True}]
]
nlp = spacy.blank("en")
category = nlp.create_pipe("textcat")
nlp.add_pipe(category)
category.add_label("KAT")
# Start the training
nlp.begin_training()
# Loop for 10 iterations
for itn in range(100):
# Shuffle the training data
random.shuffle(TRAINING_DATA)
losses = {}
# Batch the examples and iterate over them
for batch in spacy.util.minibatch(TRAINING_DATA, size=2):
texts = [text for text, entities in batch]
annotations = [{"textcat": [entities]} for text, entities in batch]
nlp.update(texts, annotations, losses=losses)
if itn % 20 == 0:
print(losses)
这感觉是错误。应该有一个错误或有意义的标记。预测证实了这一点。
{'textcat': 0.0}
{'textcat': 0.0}
{'textcat': 0.0}
{'textcat': 0.0}
{'textcat': 0.0}
感觉我的代码缺少某些内容,但我不知道是什么。
答案 0 :(得分:1)
如果您更新并使用 spaCy 3 - 上面的代码将不再有效。解决方案是通过一些更改进行迁移。我已经相应地修改了 cantdutchthis 中的示例。
变更摘要:
add_pipe
界面略有变化nlp.update
现在需要一个 Example
对象,而不是一个 text
, annotation
import spacy
# Add imports for example, as well as textcat config...
from spacy.training import Example
from spacy.pipeline.textcat import single_label_bow_config, single_label_default_config
from thinc.api import Config
import random
# labels should be one-hot encoded
TRAINING_DATA = [
["My little kitty is so special", {"KAT0": True}],
["Dude, Totally, Yeah, Video Games", {"KAT1": True}],
["Should I pay $1,000 for the iPhone X?", {"KAT1": True}],
["The iPhone 8 reviews are here", {"KAT1": True}],
["Noa is a great cat name.", {"KAT0": True}],
["We got a new kitten!", {"KAT0": True}]
]
# bow
# config = Config().from_str(single_label_bow_config)
# textensemble with attention
config = Config().from_str(single_label_default_config)
nlp = spacy.blank("en")
# now uses `add_pipe` instead
category = nlp.add_pipe("textcat", last=True)
category.add_label("KAT0")
category.add_label("KAT1")
# Start the training
nlp.begin_training()
# Loop for 10 iterations
for itn in range(100):
# Shuffle the training data
random.shuffle(TRAINING_DATA)
losses = {}
# Batch the examples and iterate over them
for batch in spacy.util.minibatch(TRAINING_DATA, size=4):
texts = [nlp.make_doc(text) for text, entities in batch]
annotations = [{"cats": entities} for text, entities in batch]
# uses an example object rather than text/annotation tuple
examples = [Example.from_dict(doc, annotation) for doc, annotation in zip(
texts, annotations
)]
nlp.update(examples, losses=losses)
if itn % 20 == 0:
print(losses)
答案 1 :(得分:0)
基于Ines的评论,这是答案。
import spacy
import random
import json
TRAINING_DATA = [
["My little kitty is so special", {"KAT": True}],
["Dude, Totally, Yeah, Video Games", {"KAT": False}],
["Should I pay $1,000 for the iPhone X?", {"KAT": False}],
["The iPhone 8 reviews are here", {"KAT": False}],
["Noa is a great cat name.", {"KAT": True}],
["We got a new kitten!", {"KAT": True}]
]
nlp = spacy.blank("en")
category = nlp.create_pipe("textcat")
category.add_label("KAT")
nlp.add_pipe(category)
# Start the training
nlp.begin_training()
# Loop for 10 iterations
for itn in range(100):
# Shuffle the training data
random.shuffle(TRAINING_DATA)
losses = {}
# Batch the examples and iterate over them
for batch in spacy.util.minibatch(TRAINING_DATA, size=1):
texts = [nlp(text) for text, entities in batch]
annotations = [{"cats": entities} for text, entities in batch]
nlp.update(texts, annotations, losses=losses)
if itn % 20 == 0:
print(losses)