IBM Watson:在递归地将文档添加到集合

时间:2017-08-08 12:42:33

标签: java ibm-cloud watson-discovery

我在IBM Bluemix帐户中有Discovery实例,我想将本地文件夹中的文档添加到此Discovery实例中的私有集合中。我通过基本上从主本地文件夹调用递归函数来实现。程序本身很好;但是,经过几次添加文档的迭代后,我遇到了以下错误:

Aug 08, 2017 1:55:07 PM okhttp3.internal.platform.Platform log
INFO: --> POST https://gateway.watsonplatform.net/discovery/api/v1/environments/{environmentId}/collections/{collectionId}/documents?version=2017-08-01 http/1.1 (-1-byte body)
Aug 08, 2017 1:59:09 PM okhttp3.internal.platform.Platform log
INFO: <-- HTTP FAILED: java.net.SocketException: Connection reset by peer: socket write error
Aug 08, 2017 1:59:10 PM okhttp3.internal.platform.Platform log
INFO: --> POST https://gateway.watsonplatform.net/discovery/api/v1/environments/{environmentId}/collections/{collectionId}/documents?version=2017-08-01 http/1.1 (-1-byte body)
Aug 08, 2017 1:59:10 PM okhttp3.internal.platform.Platform log
INFO: <-- HTTP FAILED: java.io.IOException: Stream Closed
Exception in thread "main" java.lang.RuntimeException: java.io.IOException: Stream Closed

我的工作原理是我首先初始化Discovery实例:

Discovery discovery = new Discovery("2017-08-01");
discovery.setEndPoint("https://gateway.watsonplatform.net/discovery/api");
discovery.setUsernameAndPassword({username}, {password});

然后对于文件夹中文件类型 mimetype 的每个mime支持的文件 f ,我这样做:

CreateDocumentRequest.Builder builder = new CreateDocumentRequest.Builder({environmentId}, {collectionId}).file(f, mimetype);
CreateDocumentResponse createResponse = discovery.createDocument(builder.build()).execute();

Discovery实例在循环期间是否可能超时?我应该为每个请求初始化一个新的Discovery实例吗?

更新

我很确定由于连接问题而发生异常。现在我正在尝试添加文档,以便在连接丢失时重新初始化Discovery实例。但是,它会提供INFO: <-- HTTP FAILED: java.io.IOException: Stream Closed

boolean successful;
do {
    try {
        CreateDocumentResponse createResponse = this.discovery.createDocument(builder.build()).execute();
        System.out.println(createResponse.toString());
        successful = true;
    } catch (Exception e) {
        System.err.println("Exception: " + e.getMessage());
        try {
            TimeUnit.MILLISECONDS.sleep(500);
        } catch (InterruptedException e1) {
            System.err.println("InterruptedException: " + e1.getMessage());
        }
        this.discovery = new Discovery("2017-08-01");
        this.discovery.setEndPoint("https://gateway.watsonplatform.net/discovery/api");
        this.discovery.setUsernameAndPassword(DataUploader.USERNAME, DataUploader.PASSWORD);
        successful = false;
    }
} while (!successful)

1 个答案:

答案 0 :(得分:0)

根据您所包含的内容,您的方法似乎是合理的。您不必为每个添加文档的请求创建发现类的实例。我认为这里的核心问题是处理您提供给CreateDocumentRequest.Builder的文件流。据我所知,文件流看起来过早关闭。

这是一个scala示例,它通过上传一个没有问题的文件夹中的所有文件来执行类似的操作。

import java.nio.file.{Files, Paths}
import com.ibm.watson.developer_cloud.discovery.v1.Discovery
import com.ibm.watson.developer_cloud.discovery.v1.model.document.CreateDocumentRequest
import com.ibm.watson.developer_cloud.http.HttpMediaType

object Run {
  def main(args: Array[String]): Unit = {
    if(args.length == 0) {
      println("Usage: <app> <folder-to-upload>")
      System.exit(0)
    }

    val discovery = new Discovery("2017-08-01")
    discovery.setEndPoint("https://gateway.watsonplatform.net/discovery/api")
    discovery.setUsernameAndPassword("{username}", "{password}")

    val environmentId = "<environment-id>"
    val collectionId = "<collection-id>"

    Files.list(Paths.get(args(0))).forEach { path =>
      println(s"Processing ${path.getFileName}")
      val createDocumentBuilder = new CreateDocumentRequest.Builder(environmentId, collectionId)
        .file(path.toFile, HttpMediaType.APPLICATION_JSON)
      val response = discovery.createDocument(createDocumentBuilder.build()).execute()
      println(s"DocumentID ${response.getDocumentId}")
    }
  }
}

根据提供的代码段,我无法确定您使用的是哪种方法,但如果给出选择,我会使用Builder#file(File inputFile, String mediaType)方法而不是Builder#file(InputStream content, String mediaType)。否则,在确定构建请求并将其发送到服务器之前,必须确保不关闭流。