应用错误收集

Watson Retrieve和Rank / Discovery Service始终返回具有高（est）分数的内容表

时间：2017-08-15 22:09:00

标签： ibm-cloud retrieve-and-rank watson-discovery

背景：

我使用Watson Retrieve和Rank /或Discovery Service从用户手册中检索信息。我用pdf格式的示例洗衣机手册进行了培训。我的目标是从文档中收到特定自然语言字符串的最佳段落（例如＆＃34;定位排水软管＆＃34;）。这是一般的工作。

我的问题是内容表几乎总是得分最高的一段。因此，第一个结果只是内容表而不是相关的文本段落。（参见示例结果）

＆＃34;错误＆＃34;结果（目录）：

Unpacking the washing machine ----------------------------------------------------2 Overview of the washing machine --------------------------------------------------2 Selecting a location -------------------------------------------------------------------- 3 Adjusting the leveling feet ------------------------------------------------------------3 Removing the shipping bolts --------------------------------------------------------3 Connecting the water supply hose ------------------------------------------------- 3 Positioning the drain hose ----------------------------------------------------------- 4 Plugging in the machine

＆＃34;正确＆＃34;结果

Positioning the drain hose The end of the drain hose may be positioned in three ways: Over the edge of a sink The drain hose must be placed at a height of between 60 and 90 cm. To keep the drain hose spout bent, use the supplied plastic hose

可能的解决方案

在培训过程中忽略内容表
偏移参数，例如忽略前3个结果
查明结果是否属于内容列表，如果是，则忽略

这些方法是静态的，不适用于具有各种结构的多个文档（开头/结尾的内容表/没有内容表，......）。

有人有想法更好地处理这个话题吗？

1 个答案:

答案 0 :(得分：0)

此时，通过检索结果不受相关性训练的影响。由于段落检索总是搜索整个语料库，遗憾的是，从目录中排除段落检索结果的唯一可靠方法是删除目录。