Question

我试图编写一个将单词列表插入集合的Java函数。我希望每个单词都有一个文档，其中包含唯一字段＆＃34; word＆＃34;。我要插入的单词列表包含许多重复项，所以我希望我的函数只在文档中没有与文档中包含相同的＆＃34; word＆＃34; -value的情况下才插入文档。如果已经存在具有相同＆＃34;字＆＃34; -value的文档，则该函数不应更改或替换此文档，而是继续插入列表中的下一个单词。

我在字段上创建了一个索引＆＃34; word＆＃34;避免重复文件并捕获重复密钥异常，但我不确定这是否是处理此问题的正确方法。

    IndexOptions uniqueWord = new IndexOptions().unique(true);
    collection.createIndex(Indexes.ascending("word"), uniqueWord);


        try {
            File file = new File("src/words.txt");
            Scanner scanner = new Scanner(file);


            while (scanner.hasNextLine()) {
                  String word= scanner.next();

                    Document document = new Document();
                    document.put("word", word);

                    InsertManyOptions unordered= new InsertManyOptions();
                    ArrayList<Document> docs = new ArrayList<>();
                    docs.add(document);

                    try{
                    collection.insertMany(docs, unordered.ordered(false));
                    }catch(Exception e){
                        //System.out.println(e.getMessage());
                    }

Answer 1

您写道：

如果已经存在具有相同“单词”值的文档，则该函数不应更改或替换此文档，而是继续插入列表中的下一个单词。

这排除了findOneAndUpdate或findOneAndReplace与upsert: true等原子操作的使用。

相反，我认为您的选项仅限于预先写入检查，例如：

if (collection.count(Filters.eq("word", "..."))) {
    // insert
} else {
    // ignore because there is already a document for this word
}

如果你的作家是多线程的话，这可能会受到竞争条件的影响，例如：当一个线程对来自collection.count()的错误结果作出反应时，另一个线程设法为该单词写入一个条目。 findOneAndReplace是原子的，因此不容易出现这个问题，

我建议您将findOneAndReplace与FindOneAndReplaceOptions.upsert == true一起使用，这将与忽略已编写的文档（尽管将其替换为相同的文档）具有相同的最终结果，但它可能比应用pre-write-if-exists检查更安全。

更新您编辑的问题意味着您“插入很多”，但每次循环时您只插入一个文档（尽管使用collection.insertMany()），因此上述建议仍然有效。例如：

while (scanner.hasNextLine()) {
    String word= scanner.next();

    if (collection.count(Filters.eq("word", word)) == 0L) {
        Document document = new Document();
        document.put("word", word);

        collection.insertOne(document);
    }
}

MongoDB Java驱动程序：如果文档不存在则插入文档，否则不执行任何操作

1 个答案: