应用错误收集

如何从非结构化文本（pdf，txt，html）构建语料库并培训IBM Watson？然后通过API调用提问？

时间：2017-04-09 15:25:11

标签： machine-learning ibm-watson

我想使用一些PDF，txt，html非结构化数据来训练机器学习系统，例如IBM Watson，然后通过API调用提出问题并获得答案。我怎样才能做到这一点？基于GUI的培训或基于API的培训。从Bluemix，很难确定哪种服务最适合满足此要求。你能建议最好的选择吗？

2 个答案:

答案 0 :(得分：1)

检索和排名 - 检索和排名可以显示文档集合中最相关的信息。例如，使用R＆amp; R，经验丰富的技术人员可以从密集的产品手册中快速找到解决方案。联络中心代理还可以快速找到答案，以改善平均呼叫处理时间。检索和排名服务工作＆＃34;开箱即用，＆＃34;但也可以定制，以改善结果。更多详情here

发现服务 - 通过转换，规范化和丰富非结构化数据，从非结构化数据中提取值。使用简化的查询语言来探索该数据或快速利用预先丰富的数据集，如Discovery News集合。更多详情here

答案 1 :(得分：0)

I would recommend Watson Discovery (https://www.ibm.com/watson/services/discovery) for your purpose. It's very complete and supports many features in both GUI and API. It supports questions in natural language or in query format.

Its documentation is here: https://console.bluemix.net/docs/services/discovery/getting-started.html#getting-started-with-the-api

If you create a free instance of Watson Discovery, you can test its API here: https://watson-api-explorer.mybluemix.net/apis/discovery-v1

There are examples of each API call here: https://www.ibm.com/watson/developercloud/discovery/api/v1/

There is also a demo and respective code here: https://discovery-news-demo.mybluemix.net/?cm_mc_uid=30407807098515090430617&cm_mc_sid_50200000=1509636542&cm_mc_sid_52640000=1509636542 and https://github.com/watson-developer-cloud/discovery-nodejs