Question

我正在进行烂番茄NLP预测的争夺战。

训练集格式解析如下：

PhraseId SentenceId Phrase Sentiment
     1 p 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

2 1一系列的恶作剧表明了对鹅2有益的谚语

但是，训练集公式必须如下：

（3（2（2）（2 Rock））（4（3（2）（4（2）（2）（2（2（2（2））（2（2 be）（2）（2）（2（2-21）（2（2（2 Century）（2＆＃39; s））（2（3 new）（2（2``）（2 Conan））））））））（2＆＃39;＆＃39;））（2和））（3（2 that）（3（2 he）（3（2＆＃39; s）（3（2）（3（2） to）（4（3（2 make）（3（3（2 a）（3 splash））（2（2 even）（3 greater））））（2（2 than）（2（2（2（2）（1（2 Arnold）（2 Schwarzenegger））（2，））（2（2 Jean-Claud）（2（2 Van）（2 Damme））））（2或））（2（2 Steven）（2） Segal）））））））））））））（2。）））

这是我正在使用的python代码的片段：

  phrasefind=str(train['Phrase'][i])+" " or " "+str(train['Phrase'][i]) or str(train['Phrase'][i])
    phrase=train['Phrase'][i]
    sent=rreplace(sent,phrasefind,"("+str(train['Sentiment'][i])+" "+str(phrase)+") ",1)

结果：

（1（2（2（2 A）系列）（2个恶作剧）（2（2个展示）格言）（2个）（2个什么）对鹅有好处（2个）（2个也））（3个好）（2个）（2个）（2个gander）（2，）（2个（2个））（偶尔2个）（3个）（2个）（2个没有）（其中2个））（2个数量）（2到）（2个多）（2个）（2个故事）。）

然而，斯坦福大学的情绪套餐不承认这种格式（适用于他们的train.txt）它抛出了错误：

线程中的异常＆＃34; main＆＃34; java.lang.NumberFormatException：null

连连呢？

Answer 1

我目前正在学习如何自己训练模型。

看看你的train.txt，问题在于你没有得到一些单词。我刚刚对结果进行了这些更改，命令行已成功将其添加到我的模型中：

(1 (2 (2 (2 A) series) (2 of) (2 escapades) (2 (2 demonstrating) (2 the) (2     adage)) (2 that) (2 what) (2 is) (3 good) (2 for) (2 the) (2 goose) (2 is) (2 also) (3 good) (2 for) (2 the) (2 gander) (2 ,) (2 (2 some) (2 of) (2 which)) (2 occasionally) (3 amuses) (2 but) (2 none) (2 of which) (2 amounts) (2 to) (2 much) (2 of) (2 a story) (2 .))

斯坦福NLP培训情绪模型

1 个答案: