将MongoDB中不同集合中的所有重复单独文档合并

时间:2017-05-31 09:12:12

标签: mongodb join duplicates aggregation-framework union

我一直在阅读使用MongoDB中的$lookup (aggregation)来做我认为简单的过程。我不知道这是否是正确的方法因为我是mongoDB的初学者。 我有两个名为five_million1_1和five_million2_1的集合。两个集合都有不同的不同重复记录。我想将这些(article_url)重复记录合并为一个,以及收集其他单个记录并希望将其存储在单个集合中。我尝试了thisthis,但它在同一个集合中。

收集1:five_million1_1。

{
    "_id" : ObjectId("5921aeadfe329210965ff3d2"),
    "article_url" : "a",
    "nyt_article_year" : 1994,
    "surface_keywords" : [
        {
            "surface_keyword" : "Greenwich",
            "entity_score" : 0.14455
        },
        {
            "surface_keyword" : "Frank Oz",
            "entity_score" : 0.60855
        }
    ]
}
{
    "_id" : ObjectId("5921aea4fe329210965ff3d1"),
    "article_url" : "b",
    "nyt_article_year" : 1995,
    "surface_keywords" : [
        {
            "surface_keyword" : "capital gain",
            "entity_score" : 0.43096
        },
        {
            "surface_keyword" : "pro forma",
            "entity_score" : 0.25205
        }
    ]
}

收集二:five_million2_1

{
    "_id" : ObjectId("5921aeadfe329210965ff4d5"),
    "article_url" : "a",
    "nyt_article_year" : 1994,
    "surface_keywords" : [
        {
            "surface_keyword" : "dhaka",
            "entity_score" : 0.14359
        },
        {
            "surface_keyword" : "Frank",
            "entity_score" : 0.60807   
        }
    ]
}


{
    "_id" : ObjectId("5921aea4fe329210965ff3c1"),
    "article_url" : "c",
    "nyt_article_year" : 1996,
    "surface_keywords" : [
        {
            "surface_keyword" : "capital gains",
            "entity_score" : 0.43096
        },
        {
            "surface_keyword" : "pro formas",
            "entity_score" : 0.25205
        }
    ]
}

预期结果

{
    "_id" : ObjectId("5921aeadfe329210965ff3d2"),
    "article_url" : "a",
    "nyt_article_year" : 1994,
    "surface_keywords" : [
        {
            "surface_keyword" : "Greenwich",
            "entity_score" : 0.14455
        },
        {
            "surface_keyword" : "Frank Oz",
            "entity_score" : 0.60855
        },
        {
            "surface_keyword" : "dhaka",
            "entity_score" : 0.14359

        },
        {
            "surface_keyword" : "Frank",
            "entity_score" : 0.60807

        }
    ]
}

{
    "_id" : ObjectId("5921aea4fe329210965ff3d1"),
    "article_url" : "b",
    "nyt_article_year" : 1995,
    "surface_keywords" : [
        {
            "surface_keyword" : "capital gain",
            "entity_score" : 0.43096

        },
        {
            "surface_keyword" : "pro forma",
            "entity_score" : 0.25205
        }
    ]
}
{
    "_id" : ObjectId("5921aea4fe329210965ff3c1"),
    "article_url" : "c",
    "nyt_article_year" : 1996,
    "surface_keywords" : [
        {
            "surface_keyword" : "capital gains",
            "entity_score" : 0.43096

        },
        {
            "surface_keyword" : "pro formas",
            "entity_score" : 0.25205

        }
    ]
}

0 个答案:

没有答案