如何直接从内存索引JSON文档

时间:2017-08-14 14:39:06

标签: json node.js watson watson-discovery

我正在尝试索引JSON文档,但它根本不起作用;到目前为止,我已尝试在https://developer.ibm.com/answers/questions/361808/adding-a-json-document-to-a-discovery-collection-u/中发布的解决方案,但它根本不起作用;

如果我尝试:

:tabindex

它返回给我这个错误:

    discovery.addDocument({
        environment_id: config.watson.environment_id,
        collection_id: config.watson.collection_id,
        file: JSON.stringify({
            "ocorrencia_id": 9001
        })
    }, (error, data) => {
        if (error) {
            console.error(error);
            return;
        }

        console.log(data);
    });

另一方面,如果我尝试:

    The Media Type [text/plain] of the input document is not supported. Auto correction was attempted, but the auto detected media type [text/plain] is also not supported. Supported Media Types are: application/json, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, application/xhtml+xml .

我收到此错误:

    discovery.addDocument({
        environment_id: config.watson.environment_id,
        collection_id: config.watson.collection_id,
        file: JSON.parse(JSON.stringify({
            "ocorrencia_id": 9001
        }))
    }, (error, data) => {
        if (error) {
            console.error(error);
            return;
        }

        console.log(data);
    });

同样,通过保存到临时文件,然后使用它:

TypeError: source.on is not a function
    at Function.DelayedStream.create (C:\Temp\teste-watson\watson-orchestrator\node_modules\delayed-stream\lib\delayed_stream.js:33:10)
    at FormData.CombinedStream.append (C:\Temp\teste-watson\watson-orchestrator\node_modules\combined-stream\lib\combined_stream.js:43:37)
    at FormData.append (C:\Temp\teste-watson\watson-orchestrator\node_modules\form-data\lib\form_data.js:68:3)
    at appendFormValue (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:324:21)
    at Request.init (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:337:11)
    at new Request (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\request.js:130:8)
    at request (C:\Temp\teste-watson\watson-orchestrator\node_modules\request\index.js:54:10)
    at createRequest (C:\Temp\teste-watson\watson-orchestrator\node_modules\watson-developer-cloud\lib\requestwrapper.js:177:10)
    at DiscoveryV1.addDocument (C:\Temp\teste-watson\watson-orchestrator\node_modules\watson-developer-cloud\discovery\v1.js:516:10)
    at client.query.then.res (C:\Temp\teste-watson\watson-orchestrator\populate\populate.js:36:13)
    at process._tickCallback (internal/process/next_tick.js:109:7)

然后发生这种情况:

    const tempy = require('tempy');
    const f = tempy.file({extension: 'json'});
    fs.writeFileSync(f, JSON.stringify({
            "ocorrencia_id": 9001
    }));

    discovery.addDocument({
        environment_id: config.watson.environment_id,
        collection_id: config.watson.collection_id,
        file: fs.readFileSync(f)
    }, (error, data) => {
        if (error) {
            console.error(error);
            return;
        }

        console.log(data);
    });

考虑到其他帖子建议使用JSON.parse(),似乎API接受一个JS对象,但没有一个例子,我到目前为止所做的一切似乎都没有用。似乎是一个错误?

更新:通过保存到临时文件然后使用The Media Type [application/octet-stream] of the input document is not supported. Auto correction was attempted, but the auto detected media type [text/plain] is also not supported. Supported Media Types are: application/json, application/msword, application/vnd.openxmlformats-officedocument.wordprocessingml.document, application/pdf, text/html, application/xhtml+xml . 而不是createDataStream(),它可以正常工作,但是仍然需要通过磁盘获取已经存在的信息记忆。

我也试过create a in-memory stream from a Readable,但也失败了:

readFileSync()

这个失败了:

    var Readable = require('stream').Readable;
    var s = new Readable();
    s._read = function noop() {}; // redundant? see update below
    s.push(JSON.stringify({
            "ocorrencia_id": 9001
    }));        
    s.push(null);

    discovery.addDocument({
        environment_id: config.watson.environment_id,
        collection_id: config.watson.collection_id,
        file: s
    }, (error, data) => {
        if (error) {
            console.error(error);
            return;
        }

        console.log(data);
    });

2 个答案:

答案 0 :(得分:2)

服务检查文件名然后检查内容以确定类型,但似乎没有正确识别JSON内容 - 它只看到文本。另一个答案将起作用,只要文件名以.json结尾(它不关心contentType)。

但是,我们在node.js SDK中添加了.addJsonDocument().updateJsonDocument()方法,以使其更加轻松:

discovery.addJsonDocument({
    environment_id: config.watson.environment_id,
    collection_id: config.watson.collection_id,

    // note: no JSON.stringify needed with addJsonDocument()
    file: { 
        "ocorrencia_id": 9001
    }
}, (error, data) => {
    if (error) {
        console.error(error);
        return;
    }

    console.log(data);
});

答案 1 :(得分:0)

您遇到的问题是由于缺少内容类型(默认为text/plain)。当您提供要作为字符串上载的文档时,您需要提供内容类型和文件名。在这种情况下,您可以尝试将以下内容用于您的目的

discovery.addDocument({
  //other required parameters 
  file: {
    value: JSON.stringify({ "ocorrencia_id": 9001 }),
    options: {
      filename: "some_file_name",
      contentType: "application/json; charset=utf-8"
    }
  } 
}, callbackFn)