Question

我正在尝试使用Watson的文档转换服务将一组HTML文档转换为答案单元。大约1/3的文档处理正常，其余文件处理错误：

Kazooi\ApiBundle\Entity\BlogArticle: exclusion_policy: none relations: ... - rel: user href: route: kazooi_api_user_show parameters: id: expr(object.getUser()) absolute: true exclusion: exclude_if: expr(object.getUser() === null)

这发生在相同的文档（下面的示例）上，无论我是通过watson-developer-cloud Node.js库还是通过https://document-conversion-demo.mybluemix.net/的演示提交它们都会发生，但有一个例外：使用在线演示时，如果我在上传之前用.html扩展名重命名该文件，它会成功处理。

我觉得我可能没有做我应该做的事情，比如明确说明文件类型，但我无法弄清楚如何使用Node.js watson-developer-cloud库做到这一点。我正在使用的代码如下所示：

The Media Type [application/octet-stream] of the input document is not supported. Auto correction was attempted, but the auto detected media type [text/plain] is also not supported. Supported Media Types are: application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, application/xhtml+xml.

有人可以帮忙吗？

document_conversion.convert(
         {
         file: {value: new Buffer(content), options: {}},
         conversion_target: "ANSWER_UNITS",
         type: "text/html"
         }, 
         function (err, response) 
            {
            ...

Answer 1

实际上有两个问题;我使用了错误的关键字，它在错误的地方。我不得不使用 contentType 来描述mime / type，而不是 type ，它必须位于文件参数的选项字段中 ，像这样：

document_conversion.convert({file: {value: new Buffer(content), options: {type: "text/html"}}, conversion_target: "ANSWER_UNITS" }, function (err, response) {...

非常感谢 Joe Kozhaya 让我直截了当。

Answer 2

我在你的另一个问题上发布了类似的答案，但是从v1.7.0开始，这是该库的官方支持功能：

document_conversion.convert({
  file: new Buffer(content),
  content_type: "text/html",
  conversion_target: "ANSWER_UNITS"
}, function (err, response) {
  //...
});

请注意，此处content_type（与库的其余部分保持一致）。 options.contentType恰好正常工作，因为它未经修改地传递给request，但现在这是一个经过测试和记录的功能。

Answer 3

答案就在错误信息中。支持的数据类型是：支持的媒体类型是： - application / msword - application / vnd.openxmlformats-officedocument.wordprocessingml.document - application / pdf - text / html - application / xhtml + xml

您的输入数据被检测为text / plain，而不是text / html。这就是为什么简单地将违规文件重命名为.html的原因，它足以让底层的魔术文件将输入文档分类为text / html，而不是text / plain。

您还可以使用参数＆＃34; type = text / html＆＃34;将输入类型强制为text / html;在API调用上。因此，我建议您使用任何＆＃34;纯文本＆＃34;文件。最好将这些输入文件保留为原始名称。

有关详细信息，请参阅API文档（https://www.ibm.com/smarterplanet/us/en/ibmwatson/developercloud/document-conversion/api/v1/）。

为什么我在某些文档中从Watson的文档转换服务中获得415错误？

3 个答案: