Question

当查看使用SpaCy训练NER的示例代码时，我看到GoldParse有时使用，有时不使用。

<form>
  <!-- 
				DOCK ON LEFT			
			-->
  <div id="dok">
    <div style="left: 0; width:100%;">
      <p style="font-size: 12px; padding: 0%,1%,0%,0%; font-weight: bold;">
        Beginner
      </p>
      <input type="radio" name="div_select" onclick="beg1()" />Video<br/>
      <hr/>
      <p style="font-size: 12px; padding: 0%,1%,0%,0%; font-weight: bold;">
        Adept
      </p>
      <input type="radio" name="div_select" onclick="adt1()" />Video<br/>
      <hr/>
    </div>
  </div>
  <!-- 
				OVERLAY BODY SECTIONS	
			-->
  <div id="beginner1">
    <table>
      <tr>
        <td style="font-weight: bold; font-size: 20px; text-align: left; width: 80%;">Label for Table</td>
      </tr>
    </table>
  </div>
  <div id="adept1">
    <table>
      <tr>
        <td>second div intended to test </td>
      </tr>
    </table>
  </div>
</form>

（然后，常用的东西，在NER管道上添加标签，禁用其他管道，等等）

然后我看到两种方法：

TRAINING_DATA = [
    ("How to preorder the iPhone X", {'entities': [(20, 28, 'GADGET')]})
    #Lots of other things
]

OR

for iteration in range(10):

    random.shuffle(TRAINING_DATA)
    losses = {}

    for text, annotations in TRAINING_DATA:
        doc = nlp.make_doc(text)
        entity_offsets = annotations["entities"]
        gold = GoldParse(doc, entities=entity_offsets)
        nlp.update([doc], [gold], drop=0.5, sgd=optimizer, losses=losses)
        print('Losses with gold', losses)

在此示例中，GoldParse的用途（如果有）是什么？损失输出有些不同，但是我觉得我并没有真正理解其中的区别。

Answer 1

它们在下面应该相同。如果您不评论洗牌，我希望损失是一样的。

训练NER时应何时使用GoldParse？

1 个答案: