如何从非结构化文本(pdf,txt,html)构建语料库并培训IBM Watson?然后通过API调用提问?

时间:2017-04-09 15:25:11

标签: machine-learning ibm-watson

我想使用一些PDF,txt,html非结构化数据来训练机器学习系统,例如IBM Watson,然后通过API调用提出问题并获得答案。我怎样才能做到这一点?基于GUI的培训或基于API的培训。从Bluemix,很难确定哪种服务最适合满足此要求。你能建议最好的选择吗?

2 个答案:

答案 0 :(得分:1)

检索和排名 - 检索和排名可以显示文档集合中最相关的信息。例如,使用R& R,经验丰富的技术人员可以从密集的产品手册中快速找到解决方案。联络中心代理还可以快速找到答案,以改善平均呼叫处理时间。检索和排名服务工作"开箱即用,"但也可以定制,以改善结果。更多详情here

发现服务 - 通过转换,规范化和丰富非结构化数据,从非结构化数据中提取值。使用简化的查询语言来探索该数据或快速利用预先丰富的数据集,如Discovery News集合。更多详情here

答案 1 :(得分:0)

I would recommend Watson Discovery (https://www.ibm.com/watson/services/discovery) for your purpose. It's very complete and supports many features in both GUI and API. It supports questions in natural language or in query format.

Its documentation is here: https://console.bluemix.net/docs/services/discovery/getting-started.html#getting-started-with-the-api

If you create a free instance of Watson Discovery, you can test its API here: https://watson-api-explorer.mybluemix.net/apis/discovery-v1

There are examples of each API call here: https://www.ibm.com/watson/developercloud/discovery/api/v1/

There is also a demo and respective code here: https://discovery-news-demo.mybluemix.net/?cm_mc_uid=30407807098515090430617&cm_mc_sid_50200000=1509636542&cm_mc_sid_52640000=1509636542 and https://github.com/watson-developer-cloud/discovery-nodejs