AWS Comprehend自定义分类作业输出的行多于输入

时间:2019-05-21 12:00:04

标签: amazon-web-services aws-comprehend

我已经使用AWS Comprehend训练了NLP模型。测试集上的预测成功运行,但是输出文件中的行比输入多:

输入:1000行

输出:2082行

输出看起来像这样:

predictions.json <...>
{"File": "test.csv", "Line": "0", "Classes": [{"Name": "No", "Score": 0.7022}, {"Name": "Yes", "Score": 0.2892}, {"Name": "tag", "Score": 0.0086}]}
{"File": "test.csv", "Line": "1", "Classes": [{"Name": "No", "Score": 0.6252}, {"Name": "Yes", "Score": 0.3747}, {"Name": "tag", "Score": 0.0001}]}
{"File": "test.csv", "Line": "2", "Classes": [{"Name": "No", "Score": 0.9295}, {"Name": "Yes", "Score": 0.0705}, {"Name": "tag", "Score": 0.0}]}
{"File": "test.csv", "Line": "3", "Classes": [{"Name": "No", "Score": 0.5247}, {"Name": "Yes", "Score": 0.4753}, {"Name": "tag", "Score": 0.0}]}
...
{"File": "test.csv", "Line": "2080", "Classes": [{"Name": "No", "Score": 0.8528}, {"Name": "Yes", "Score": 0.1471}, {"Name": "tag", "Score": 0.0001}]}
{"File": "test.csv", "Line": "2081", "Classes": [{"Name": "No", "Score": 0.5318}, {"Name": "Yes", "Score": 0.4682}, {"Name": "tag", "Score": 0.0}]}

有人可以帮助我使用输出吗?

2 个答案:

答案 0 :(得分:0)

一个选项是将每个句子拆分到一个不同的文件中,然后将整个文件夹用作测试集,并修复该选项:

 "InputFormat": "ONE_DOC_PER_FILE"

其他选项是尝试查找数据集中有多少个“ / n”,错误可能是这个。

答案 1 :(得分:0)

我遇到了同样的问题。在我的情况下,该错误是因为预测文件(在您的情况下为Test.csv)未使用指定的编码。 AWS Comprehend需要-“ UTF-8”编码。
AWS Docs Link