Question

I quite new with MongoDB and am working with it in my Java project.

I have the folloing document structure in my collection:

{ "_id":"ProcessX", "tasks":[ { "taskName":"TaskX", "taskTime":"2018-08-09T13:38:58.317Z", "crawledList":[ "http://dbpedia.org/ontology/birthYear" ] }, { "taskName":"TaskX", "taskTime":"2018-08-10T06:19:32.006Z", "crawledList":[ "http://dbpedia.org/ontology/birthYear", "http://dbpedia.org/page/Mo_Chua_of_Balla" ] }, { "taskName":"TaskY", "taskTime":"2018-08-10T06:21:58.737Z", "crawledList":[ "http://dbpedia.org/page/Mo_Chua_of_Balla" ] } ] }

I want to put a "newURI" into a task's crawledList if it does not exists. Here is the process:

Find the process document with _id = "someProcessName"
Find the task document, in tasks array, with taskName = "someTaskName" and taskTime = "someTaskTime"
Check if the "newURI" exists in the crawledList of that task document
If it does not exists, insert the newURI into crawledList of the task document

I don't want to retrieve documents into memory and work with primitive Java types (Lists etc.) Can you help me to write the most efficient code by using MongoDB's Java Driver commands?

I don't have any indexes defined because I don't know which indexes I should define. I can also change the document structure if there is a better way to represent them and do this operation faster.

Thank you in advance.

Answer 1

最后，通过阅读Java驱动程序文档并在网上搜索，我成功实现了以下两个功能：

public boolean crawledBefore(IRI iri) {
    return collection.countDocuments(
            and(eq("_id", CrawlProcess.getProcessName()),
                    elemMatch("tasks", and(eq("taskName", CrawlProcess.getTaskName()),
                                            eq("taskTime", CrawlProcess.getCreationTime()),
                                            in("crawledList",iri.toString()))))) != 0;
}

public void addToStore(IRI iri) {
    if(!crawledBefore(iri)) {
        collection.updateOne(
                and(eq("_id", CrawlProcess.getProcessName()),
                    elemMatch("tasks", and(eq("taskName", CrawlProcess.getTaskName()),
                                            eq("taskTime", CrawlProcess.getCreationTime())))), 
                push("tasks.$.crawledList",iri.toString()));        
    }
}

这是它的工作方式：

crawledBefore（）函数获取一个IRI并查看是否存在任何文档；在任务文档中的IRI的crawledList数组中具有该IRI，该任务文档是过程文档中的嵌入式文档。带有给定流程名称，任务名称和时间的流程文档始终存在于我的集合中，我在这里检查的只是该文档中存在IRI。

如果是，则第二个函数将新IRI添加到流程文档中该特定任务文档的crawledList。

干杯。

Inserting an element into an Array in an embedded document in MongoDB

1 个答案: