我已经使用AWS Comprehend训练了NLP模型。测试集上的预测成功运行,但是输出文件中的行比输入多:
输入:1000行
输出:2082行
输出看起来像这样:
predictions.json <...>
{"File": "test.csv", "Line": "0", "Classes": [{"Name": "No", "Score": 0.7022}, {"Name": "Yes", "Score": 0.2892}, {"Name": "tag", "Score": 0.0086}]}
{"File": "test.csv", "Line": "1", "Classes": [{"Name": "No", "Score": 0.6252}, {"Name": "Yes", "Score": 0.3747}, {"Name": "tag", "Score": 0.0001}]}
{"File": "test.csv", "Line": "2", "Classes": [{"Name": "No", "Score": 0.9295}, {"Name": "Yes", "Score": 0.0705}, {"Name": "tag", "Score": 0.0}]}
{"File": "test.csv", "Line": "3", "Classes": [{"Name": "No", "Score": 0.5247}, {"Name": "Yes", "Score": 0.4753}, {"Name": "tag", "Score": 0.0}]}
...
{"File": "test.csv", "Line": "2080", "Classes": [{"Name": "No", "Score": 0.8528}, {"Name": "Yes", "Score": 0.1471}, {"Name": "tag", "Score": 0.0001}]}
{"File": "test.csv", "Line": "2081", "Classes": [{"Name": "No", "Score": 0.5318}, {"Name": "Yes", "Score": 0.4682}, {"Name": "tag", "Score": 0.0}]}
有人可以帮助我使用输出吗?
答案 0 :(得分:0)
一个选项是将每个句子拆分到一个不同的文件中,然后将整个文件夹用作测试集,并修复该选项:
"InputFormat": "ONE_DOC_PER_FILE"
其他选项是尝试查找数据集中有多少个“ / n”,错误可能是这个。
答案 1 :(得分:0)
我遇到了同样的问题。在我的情况下,该错误是因为预测文件(在您的情况下为Test.csv)未使用指定的编码。 AWS Comprehend需要-“ UTF-8”编码。
AWS Docs Link